Re: [DISCUSS] Update Roadmap

2016-03-01 Thread Zhong Wang
+1 on @rick. quality is really important... I am still encountering bugs
consistently

On Tue, Mar 1, 2016 at 10:16 AM, TEJA SRIVASTAV 
wrote:

> +1 on @rick
>
> On Tue, Mar 1, 2016 at 11:26 PM Benjamin Kim  wrote:
>
>> I see in the Enterprise section that multi-tenancy will be included, will
>> this have user impersonation too? In this way, the user executing will be
>> the user owning the process.
>>
>> On Mar 1, 2016, at 12:51 AM, Shabeel Syed  wrote:
>>
>> +1
>>
>> Hi Tamas,
>>Pluggable external visualization is really a GREAT feature to have.
>> I'm looking forward to this :)
>>
>> Regards
>> Shabeel
>>
>> On Tue, Mar 1, 2016 at 2:16 PM, Tamas Szuromi 
>> wrote:
>>
>>> Hey,
>>>
>>> Really promising roadmap.
>>>
>>> I'd only push more visualization options. I agree built in
>>> visualization is needed with limited charting options but I think we also
>>> need somehow 'inject' external js visualizations also.
>>>
>>>
>>> For scheduling Zeppelin notebooks  we use
>>>  https://github.com/airbnb/airflow  
>>> through
>>> the job rest api. It's an enterprise ready and very robust solution
>>> right now.
>>>
>>>
>>> *Tamas*
>>>
>>> On 1 March 2016 at 09:12, Eran Witkon  wrote:
>>>
 One point to clarify, I don't want to suggest Oozie in specific, I want
 to think about which features we develop and which ones we integrate
 external, preferred Apache, technology? We don't think about building our
 own storage services so why build our own scheduler?
 Eran
 On Tue, 1 Mar 2016 at 09:49 moon soo Lee  wrote:

> @Vinayak, @Eran, @Benjamin, @Guilherme, @Sourav, @Rick
> Now I can see a lot of demands around enterprise level job scheduling.
> Either external or built-in, I completely agree having enterprise level 
> job
> scheduling support on the roadmap.
> ZEPPELIN-137 ,
> ZEPPELIN-531  are
> related issues i can find in our JIRA.
>
> @Vinayak
> Regarding importing notebook from github, Zeppelin has pluggable
> notebook storage layer (see related package
> ).
> So, github notebook sync can be implemented easily.
>
> @Shabeel
> Right, we need better manage management to prevent such OOM.
> And i think table is one of the most frequently used way of displaying
> data. So definitely, we'll need more features like filter, sort, etc.
> After this roadmap discussion, discussion for the next release will
> follow. Then we'll get idea when those features will be available.
>
> @Prasad
> Thanks for mentioning HA and DR. They're really important subject for
> enterprise use. Definitely Zeppelin will need to address them.
> And displaying meta information of notebook on top level page is good
> idea.
>
> It's really great to hear many opinions and ideas.
> And thanks @Rick for sharing valuable view to Zeppelin project.
>
> Thanks,
> moon
>
>
> On Mon, Feb 29, 2016 at 11:14 PM Rick Moritz  wrote:
>
>> Hi,
>>
>> For one, I know that there is rudimentary scheduling built into
>> Zeppelin already (at least I fixed a bug in the test for a scheduling
>> feature a few months ago).
>> But another point is, that Zeppelin should also focus on quality,
>> reproduceability and portability.
>> Although this doesn't offer exciting new features, it would make
>> development much easier.
>>
>> Cross-platform testability, Tests that pass when run sequentially,
>> compatibility with Firefox, and many more open issues that make it so 
>> much
>> harder to enhance Zeppelin and add features should be addressed soon,
>> preferably before more features are added. Already Zeppelin is suffering 
>> -
>> in my opinion - from quite a lot of feature creep, and we should avoid
>> putting in the kitchen sink, at the cost of quality and maintainability.
>> Instead modularity (ZEPPELIN-533 in particular) should be targeted.
>>
>> Oozie, in my opinion, is a dead end - it may de-facto still be in use
>> on many clusters, but it's not getting the love it needs, and I wouldn't
>> bet on it, when it comes to integrating scheduling. Instead, any external
>> tool should be able to use the REST-API to trigger executions, if you 
>> want
>> external scheduling.
>>
>> So, in conclusion, if we take Moon's list as a list of descending
>> priorities, I fully agree, under the condition that code quality is
>> included as a subset of enterprise-readyness. Auth* is paramount 
>> (Kerberos
>> SPNEGO SSO support is what we really want) with user and group rights
>> assignment on the notebook level. We probably al

Re: error "Could not find creator property with name 'id' "

2016-03-01 Thread moon soo Lee
I thought https://issues.apache.org/jira/browse/ZEPPELIN-469 resolved this
issue. But if your Zeppelin is based on 0.5.6 or master branch, and you've
faced the problem, could you share how to reproduce ?

Thanks,
moon

On Tue, Mar 1, 2016 at 12:36 PM enzo  wrote:

> Hi Moon
>
> Thanks!!  The fixes proposed in the post resolved my problem.
>
> On the other hand, if this is happening to everybody (as I assume),  maybe
> this should be addressed a bit more systematically??
>
> Thanks again!
>
> Enzo
> e...@smartinsightsfromdata.com
>
>
>
> On 1 Mar 2016, at 19:13, moon soo Lee  wrote:
>
> Hi Enzo,
>
> It happens when you have multiple version of jackson library in your
> classpath. Please check following email thread
>
> http://apache-zeppelin-users-incubating-mailing-list.75479.x6.nabble.com/com-fasterxml-jackson-databind-JsonMappingException-td1607.html
>
> Thanks,
> moon
>
> On Tue, Mar 1, 2016 at 8:46 AM enzo 
> wrote:
>
>> I get the following euro in a variety of circumstances.
>>
>> I’ve downloaded zeppelin a couple of days ago.  I use Spark 1.6.0.
>>
>>
>> For example:
>>
>> %spark
>>
>> val raw = sc.textFile("/tmp/github.json”)  // reading a 25Mb file from
>> /tmp
>>
>> Gives the following error.  Hey please!!
>>
>>
>> com.fasterxml.jackson.databind.JsonMappingException: Could not find
>> creator property with name 'id' (in class
>> org.apache.spark.rdd.RDDOperationScope)
>> at [Source: {"id":"0","name":"textFile"}; line: 1, column: 1]
>> at
>> com.fasterxml.jackson.databind.JsonMappingException.from(JsonMappingException.java:148)
>> at
>> com.fasterxml.jackson.databind.DeserializationContext.mappingException(DeserializationContext.java:843)
>> at
>> com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.addBeanProps(BeanDeserializerFactory.java:533)
>> at
>> com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.buildBeanDeserializer(BeanDeserializerFactory.java:220)
>> at
>> com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.createBeanDeserializer(BeanDeserializerFactory.java:143)
>> at
>> com.fasterxml.jackson.databind.deser.DeserializerCache._createDeserializer2(DeserializerCache.java:409)
>> at
>> com.fasterxml.jackson.databind.deser.DeserializerCache._createDeserializer(DeserializerCache.java:358)
>> at
>> com.fasterxml.jackson.databind.deser.DeserializerCache._createAndCache2(DeserializerCache.java:265)
>> at
>> com.fasterxml.jackson.databind.deser.DeserializerCache._createAndCacheValueDeserializer(DeserializerCache.java:245)
>> at
>> com.fasterxml.jackson.databind.deser.DeserializerCache.findValueDeserializer(DeserializerCache.java:143)
>> at
>> com.fasterxml.jackson.databind.DeserializationContext.findRootValueDeserializer(DeserializationContext.java:439)
>> at
>> com.fasterxml.jackson.databind.ObjectMapper._findRootDeserializer(ObjectMapper.java:3666)
>> at
>> com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3558)
>> at
>> com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2578)
>> at
>> org.apache.spark.rdd.RDDOperationScope$.fromJson(RDDOperationScope.scala:85)
>> at
>> org.apache.spark.rdd.RDDOperationScope$$anonfun$5.apply(RDDOperationScope.scala:136)
>> at
>> org.apache.spark.rdd.RDDOperationScope$$anonfun$5.apply(RDDOperationScope.scala:136)
>> at scala.Option.map(Option.scala:145)
>> at
>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:136)
>> at
>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
>> at org.apache.spark.SparkContext.withScope(SparkContext.scala:714)
>> at org.apache.spark.SparkContext.hadoopFile(SparkContext.scala:1011)
>> at
>> org.apache.spark.SparkContext$$anonfun$textFile$1.apply(SparkContext.scala:832)
>> at
>> org.apache.spark.SparkContext$$anonfun$textFile$1.apply(SparkContext.scala:830)
>> at
>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
>> at
>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
>> at org.apache.spark.SparkContext.withScope(SparkContext.scala:714)
>> at org.apache.spark.SparkContext.textFile(SparkContext.scala:830)
>> at
>> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:38)
>> at
>> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:43)
>> at
>> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:45)
>> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:47)
>> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:49)
>> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:51)
>> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:53)
>> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:55)
>> at $iwC$$iwC$$iwC$$iwC$$iwC.(:57)
>> at $iwC$$iwC$$iwC$$iwC.(:59)
>> at $iwC$$iwC$$iwC.(:61)
>> at $iwC$$iwC.(:63)
>> at $iwC.(:65)
>> at (:67)
>> at .(:71)
>> at .()
>> at .(:7)
>> at .()
>> at $print()
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>> at
>>

Re: Zeppelin Real Time usecases

2016-03-01 Thread moon soo Lee
Hi Shabeel,

Trying
https://gist.github.com/granturing/a09aed4a302a7367be92
would help. It displays tweets on the map in realtime, data from spark
streaming.

Thanks,
moon

On Tue, Mar 1, 2016 at 12:48 AM Shabeel Syed  wrote:

> Hi All,
>
>I'm planning to give a demo tomorrow to my team here on Zeppelin.
>
>It would be great, if I can get a list of some real time usecases with
> Zeppelin by prominent companies .
>
> Regards,
> Shabeel
>


Re: error "Could not find creator property with name 'id' "

2016-03-01 Thread enzo
Hi Moon

Thanks!!  The fixes proposed in the post resolved my problem.

On the other hand, if this is happening to everybody (as I assume),  maybe this 
should be addressed a bit more systematically??

Thanks again!

Enzo
e...@smartinsightsfromdata.com



> On 1 Mar 2016, at 19:13, moon soo Lee  wrote:
> 
> Hi Enzo,
> 
> It happens when you have multiple version of jackson library in your 
> classpath. Please check following email thread
> http://apache-zeppelin-users-incubating-mailing-list.75479.x6.nabble.com/com-fasterxml-jackson-databind-JsonMappingException-td1607.html
>  
> 
> 
> Thanks,
> moon
> 
> On Tue, Mar 1, 2016 at 8:46 AM enzo  > wrote:
> I get the following euro in a variety of circumstances.
> 
> I’ve downloaded zeppelin a couple of days ago.  I use Spark 1.6.0.
> 
> 
> For example:
> 
> %spark
> 
> val raw = sc.textFile("/tmp/github.json”)  // reading a 25Mb file from /tmp
> 
> Gives the following error.  Hey please!!
> 
> 
> com.fasterxml.jackson.databind.JsonMappingException: Could not find creator 
> property with name 'id' (in class org.apache.spark.rdd.RDDOperationScope)
>  at [Source: {"id":"0","name":"textFile"}; line: 1, column: 1]
>   at 
> com.fasterxml.jackson.databind.JsonMappingException.from(JsonMappingException.java:148)
>   at 
> com.fasterxml.jackson.databind.DeserializationContext.mappingException(DeserializationContext.java:843)
>   at 
> com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.addBeanProps(BeanDeserializerFactory.java:533)
>   at 
> com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.buildBeanDeserializer(BeanDeserializerFactory.java:220)
>   at 
> com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.createBeanDeserializer(BeanDeserializerFactory.java:143)
>   at 
> com.fasterxml.jackson.databind.deser.DeserializerCache._createDeserializer2(DeserializerCache.java:409)
>   at 
> com.fasterxml.jackson.databind.deser.DeserializerCache._createDeserializer(DeserializerCache.java:358)
>   at 
> com.fasterxml.jackson.databind.deser.DeserializerCache._createAndCache2(DeserializerCache.java:265)
>   at 
> com.fasterxml.jackson.databind.deser.DeserializerCache._createAndCacheValueDeserializer(DeserializerCache.java:245)
>   at 
> com.fasterxml.jackson.databind.deser.DeserializerCache.findValueDeserializer(DeserializerCache.java:143)
>   at 
> com.fasterxml.jackson.databind.DeserializationContext.findRootValueDeserializer(DeserializationContext.java:439)
>   at 
> com.fasterxml.jackson.databind.ObjectMapper._findRootDeserializer(ObjectMapper.java:3666)
>   at 
> com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3558)
>   at 
> com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2578)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.fromJson(RDDOperationScope.scala:85)
>   at 
> org.apache.spark.rdd.RDDOperationScope$$anonfun$5.apply(RDDOperationScope.scala:136)
>   at 
> org.apache.spark.rdd.RDDOperationScope$$anonfun$5.apply(RDDOperationScope.scala:136)
>   at scala.Option.map(Option.scala:145)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:136)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
>   at org.apache.spark.SparkContext.withScope(SparkContext.scala:714)
>   at org.apache.spark.SparkContext.hadoopFile(SparkContext.scala:1011)
>   at 
> org.apache.spark.SparkContext$$anonfun$textFile$1.apply(SparkContext.scala:832)
>   at 
> org.apache.spark.SparkContext$$anonfun$textFile$1.apply(SparkContext.scala:830)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
>   at org.apache.spark.SparkContext.withScope(SparkContext.scala:714)
>   at org.apache.spark.SparkContext.textFile(SparkContext.scala:830)
>   at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:38)
>   at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:43)
>   at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:45)
>   at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:47)
>   at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:49)
>   at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:51)
>   at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:53)
>   at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:55)
>   at $iwC$$iwC$$iwC$$iwC$$iwC.(:57)
>   at $iwC$$iwC$$iwC$$iwC.(:59)
>   at $iwC$$iwC$$iwC.(:61)
>   at $iwC$$iwC.(:63)
>   at $iwC.(:65)
>   at (:67)
>   at .(:71)
>   at .()
>   at .(:7)
>   at .()
>   at $print()
>   at sun.reflect.NativeMethodA

Re: error "Could not find creator property with name 'id' "

2016-03-01 Thread moon soo Lee
Hi Enzo,

It happens when you have multiple version of jackson library in your
classpath. Please check following email thread
http://apache-zeppelin-users-incubating-mailing-list.75479.x6.nabble.com/com-fasterxml-jackson-databind-JsonMappingException-td1607.html

Thanks,
moon

On Tue, Mar 1, 2016 at 8:46 AM enzo  wrote:

> I get the following euro in a variety of circumstances.
>
> I’ve downloaded zeppelin a couple of days ago.  I use Spark 1.6.0.
>
>
> For example:
>
> %spark
>
> val raw = sc.textFile("/tmp/github.json”)  // reading a 25Mb file from /tmp
>
> Gives the following error.  Hey please!!
>
>
> com.fasterxml.jackson.databind.JsonMappingException: Could not find
> creator property with name 'id' (in class
> org.apache.spark.rdd.RDDOperationScope)
> at [Source: {"id":"0","name":"textFile"}; line: 1, column: 1]
> at
> com.fasterxml.jackson.databind.JsonMappingException.from(JsonMappingException.java:148)
> at
> com.fasterxml.jackson.databind.DeserializationContext.mappingException(DeserializationContext.java:843)
> at
> com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.addBeanProps(BeanDeserializerFactory.java:533)
> at
> com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.buildBeanDeserializer(BeanDeserializerFactory.java:220)
> at
> com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.createBeanDeserializer(BeanDeserializerFactory.java:143)
> at
> com.fasterxml.jackson.databind.deser.DeserializerCache._createDeserializer2(DeserializerCache.java:409)
> at
> com.fasterxml.jackson.databind.deser.DeserializerCache._createDeserializer(DeserializerCache.java:358)
> at
> com.fasterxml.jackson.databind.deser.DeserializerCache._createAndCache2(DeserializerCache.java:265)
> at
> com.fasterxml.jackson.databind.deser.DeserializerCache._createAndCacheValueDeserializer(DeserializerCache.java:245)
> at
> com.fasterxml.jackson.databind.deser.DeserializerCache.findValueDeserializer(DeserializerCache.java:143)
> at
> com.fasterxml.jackson.databind.DeserializationContext.findRootValueDeserializer(DeserializationContext.java:439)
> at
> com.fasterxml.jackson.databind.ObjectMapper._findRootDeserializer(ObjectMapper.java:3666)
> at
> com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3558)
> at
> com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2578)
> at
> org.apache.spark.rdd.RDDOperationScope$.fromJson(RDDOperationScope.scala:85)
> at
> org.apache.spark.rdd.RDDOperationScope$$anonfun$5.apply(RDDOperationScope.scala:136)
> at
> org.apache.spark.rdd.RDDOperationScope$$anonfun$5.apply(RDDOperationScope.scala:136)
> at scala.Option.map(Option.scala:145)
> at
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:136)
> at
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
> at org.apache.spark.SparkContext.withScope(SparkContext.scala:714)
> at org.apache.spark.SparkContext.hadoopFile(SparkContext.scala:1011)
> at
> org.apache.spark.SparkContext$$anonfun$textFile$1.apply(SparkContext.scala:832)
> at
> org.apache.spark.SparkContext$$anonfun$textFile$1.apply(SparkContext.scala:830)
> at
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
> at
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
> at org.apache.spark.SparkContext.withScope(SparkContext.scala:714)
> at org.apache.spark.SparkContext.textFile(SparkContext.scala:830)
> at
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:38)
> at
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:43)
> at
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:45)
> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:47)
> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:49)
> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:51)
> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:53)
> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:55)
> at $iwC$$iwC$$iwC$$iwC$$iwC.(:57)
> at $iwC$$iwC$$iwC$$iwC.(:59)
> at $iwC$$iwC$$iwC.(:61)
> at $iwC$$iwC.(:63)
> at $iwC.(:65)
> at (:67)
> at .(:71)
> at .()
> at .(:7)
> at .()
> at $print()
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
> at
> org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)
> at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
> at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
> at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
> at
> org.apache.zeppelin.spark.SparkInterpreter.interpretInput(SparkInterpreter.java:780)
> at
> org.apache.zeppelin.spark.SparkInterpreter.interpret(SparkInterpreter.java:744)
> at
> org.a

Re: Math formula on %md model

2016-03-01 Thread moon soo Lee
+1 This would be useful

On Tue, Mar 1, 2016 at 7:36 AM Aish Fenton  wrote:

> +1 that'd be amazing.
> On Mon, Feb 29, 2016 at 6:04 PM Trevor Grant 
> wrote:
>
>> +1 for math formula support in the mark down interpreter,
>>
>> Trevor Grant
>> Data Scientist
>> https://github.com/rawkintrevo
>> http://stackexchange.com/users/3002022/rawkintrevo
>> http://trevorgrant.org
>>
>> *"Fortunate is he, who is able to know the causes of things."  -Virgil*
>>
>>
>> On Mon, Feb 29, 2016 at 7:57 PM, Jun Chen  wrote:
>>
>>> Hi,
>>>
>>> I found both Jupyter and Gitbook(use katex or mathjax) support math
>>> formula on markdown model, so how about zeppelin? The $$ math formula $$ is
>>> not work in zeppelin.
>>>
>>> BR
>>> mufeng
>>>
>>
>>


Re: [DISCUSS] Update Roadmap

2016-03-01 Thread TEJA SRIVASTAV
+1 on @rick

On Tue, Mar 1, 2016 at 11:26 PM Benjamin Kim  wrote:

> I see in the Enterprise section that multi-tenancy will be included, will
> this have user impersonation too? In this way, the user executing will be
> the user owning the process.
>
> On Mar 1, 2016, at 12:51 AM, Shabeel Syed  wrote:
>
> +1
>
> Hi Tamas,
>Pluggable external visualization is really a GREAT feature to have.
> I'm looking forward to this :)
>
> Regards
> Shabeel
>
> On Tue, Mar 1, 2016 at 2:16 PM, Tamas Szuromi 
> wrote:
>
>> Hey,
>>
>> Really promising roadmap.
>>
>> I'd only push more visualization options. I agree built in visualization
>> is needed with limited charting options but I think we also need somehow
>> 'inject' external js visualizations also.
>>
>>
>> For scheduling Zeppelin notebooks  we use
>>  https://github.com/airbnb/airflow  
>> through
>> the job rest api. It's an enterprise ready and very robust solution
>> right now.
>>
>>
>> *Tamas*
>>
>> On 1 March 2016 at 09:12, Eran Witkon  wrote:
>>
>>> One point to clarify, I don't want to suggest Oozie in specific, I want
>>> to think about which features we develop and which ones we integrate
>>> external, preferred Apache, technology? We don't think about building our
>>> own storage services so why build our own scheduler?
>>> Eran
>>> On Tue, 1 Mar 2016 at 09:49 moon soo Lee  wrote:
>>>
 @Vinayak, @Eran, @Benjamin, @Guilherme, @Sourav, @Rick
 Now I can see a lot of demands around enterprise level job scheduling.
 Either external or built-in, I completely agree having enterprise level job
 scheduling support on the roadmap.
 ZEPPELIN-137 ,
 ZEPPELIN-531  are
 related issues i can find in our JIRA.

 @Vinayak
 Regarding importing notebook from github, Zeppelin has pluggable
 notebook storage layer (see related package
 ).
 So, github notebook sync can be implemented easily.

 @Shabeel
 Right, we need better manage management to prevent such OOM.
 And i think table is one of the most frequently used way of displaying
 data. So definitely, we'll need more features like filter, sort, etc.
 After this roadmap discussion, discussion for the next release will
 follow. Then we'll get idea when those features will be available.

 @Prasad
 Thanks for mentioning HA and DR. They're really important subject for
 enterprise use. Definitely Zeppelin will need to address them.
 And displaying meta information of notebook on top level page is good
 idea.

 It's really great to hear many opinions and ideas.
 And thanks @Rick for sharing valuable view to Zeppelin project.

 Thanks,
 moon


 On Mon, Feb 29, 2016 at 11:14 PM Rick Moritz  wrote:

> Hi,
>
> For one, I know that there is rudimentary scheduling built into
> Zeppelin already (at least I fixed a bug in the test for a scheduling
> feature a few months ago).
> But another point is, that Zeppelin should also focus on quality,
> reproduceability and portability.
> Although this doesn't offer exciting new features, it would make
> development much easier.
>
> Cross-platform testability, Tests that pass when run sequentially,
> compatibility with Firefox, and many more open issues that make it so much
> harder to enhance Zeppelin and add features should be addressed soon,
> preferably before more features are added. Already Zeppelin is suffering -
> in my opinion - from quite a lot of feature creep, and we should avoid
> putting in the kitchen sink, at the cost of quality and maintainability.
> Instead modularity (ZEPPELIN-533 in particular) should be targeted.
>
> Oozie, in my opinion, is a dead end - it may de-facto still be in use
> on many clusters, but it's not getting the love it needs, and I wouldn't
> bet on it, when it comes to integrating scheduling. Instead, any external
> tool should be able to use the REST-API to trigger executions, if you want
> external scheduling.
>
> So, in conclusion, if we take Moon's list as a list of descending
> priorities, I fully agree, under the condition that code quality is
> included as a subset of enterprise-readyness. Auth* is paramount (Kerberos
> SPNEGO SSO support is what we really want) with user and group rights
> assignment on the notebook level. We probably also need Knox-integration
> (ODP-Members looking at integrating Zeppelin should consider contributing
> this), and integration of something like Spree (
> https://github.com/hammerlab/spree) to be able to profile jobs.
>
> I'm hopeful that soon I can resume contributing some qual

Re: [DISCUSS] Update Roadmap

2016-03-01 Thread Benjamin Kim
I see in the Enterprise section that multi-tenancy will be included, will this 
have user impersonation too? In this way, the user executing will be the user 
owning the process.

> On Mar 1, 2016, at 12:51 AM, Shabeel Syed  wrote:
> 
> +1
> 
> Hi Tamas,
>Pluggable external visualization is really a GREAT feature to have. I'm 
> looking forward to this :)
> 
> Regards
> Shabeel
> 
> On Tue, Mar 1, 2016 at 2:16 PM, Tamas Szuromi  > wrote:
> Hey,
> 
> Really promising roadmap.
> 
> I'd only push more visualization options. I agree built in visualization is 
> needed with limited charting options but I think we also need somehow 
> 'inject' external js visualizations also. 
> 
> 
> For scheduling Zeppelin notebooks  we use https://github.com/airbnb/airflow 
>  through the job rest api. It's an 
> enterprise ready and very robust solution right now.
> 
> Tamas
> 
> 
> On 1 March 2016 at 09:12, Eran Witkon  > wrote:
> One point to clarify, I don't want to suggest Oozie in specific, I want to 
> think about which features we develop and which ones we integrate external, 
> preferred Apache, technology? We don't think about building our own storage 
> services so why build our own scheduler?
> Eran 
> On Tue, 1 Mar 2016 at 09:49 moon soo Lee  > wrote:
> @Vinayak, @Eran, @Benjamin, @Guilherme, @Sourav, @Rick
> Now I can see a lot of demands around enterprise level job scheduling. Either 
> external or built-in, I completely agree having enterprise level job 
> scheduling support on the roadmap.
> ZEPPELIN-137 , 
> ZEPPELIN-531  are related 
> issues i can find in our JIRA.
> 
> @Vinayak
> Regarding importing notebook from github, Zeppelin has pluggable notebook 
> storage layer (see related package 
> ).
>  So, github notebook sync can be implemented easily.
> 
> @Shabeel
> Right, we need better manage management to prevent such OOM.
> And i think table is one of the most frequently used way of displaying data. 
> So definitely, we'll need more features like filter, sort, etc.
> After this roadmap discussion, discussion for the next release will follow. 
> Then we'll get idea when those features will be available.
> 
> @Prasad
> Thanks for mentioning HA and DR. They're really important subject for 
> enterprise use. Definitely Zeppelin will need to address them.
> And displaying meta information of notebook on top level page is good idea.
> 
> It's really great to hear many opinions and ideas.
> And thanks @Rick for sharing valuable view to Zeppelin project.
> 
> Thanks,
> moon
> 
> 
> On Mon, Feb 29, 2016 at 11:14 PM Rick Moritz  > wrote:
> Hi,
> 
> For one, I know that there is rudimentary scheduling built into Zeppelin 
> already (at least I fixed a bug in the test for a scheduling feature a few 
> months ago).
> But another point is, that Zeppelin should also focus on quality, 
> reproduceability and portability.
> Although this doesn't offer exciting new features, it would make development 
> much easier.
> 
> Cross-platform testability, Tests that pass when run sequentially, 
> compatibility with Firefox, and many more open issues that make it so much 
> harder to enhance Zeppelin and add features should be addressed soon, 
> preferably before more features are added. Already Zeppelin is suffering - in 
> my opinion - from quite a lot of feature creep, and we should avoid putting 
> in the kitchen sink, at the cost of quality and maintainability. Instead 
> modularity (ZEPPELIN-533 in particular) should be targeted.
> 
> Oozie, in my opinion, is a dead end - it may de-facto still be in use on many 
> clusters, but it's not getting the love it needs, and I wouldn't bet on it, 
> when it comes to integrating scheduling. Instead, any external tool should be 
> able to use the REST-API to trigger executions, if you want external 
> scheduling.
> 
> So, in conclusion, if we take Moon's list as a list of descending priorities, 
> I fully agree, under the condition that code quality is included as a subset 
> of enterprise-readyness. Auth* is paramount (Kerberos SPNEGO SSO support is 
> what we really want) with user and group rights assignment on the notebook 
> level. We probably also need Knox-integration (ODP-Members looking at 
> integrating Zeppelin should consider contributing this), and integration of 
> something like Spree (https://github.com/hammerlab/spree 
> ) to be able to profile jobs.
> 
> I'm hopeful that soon I can resume contributing some quality-oriented code, 
> to drive this "necessary evil" forward ;)
> 
> On Mon, Feb 29, 2016 at 8:27 PM, Sourav Mazumder  

Re: Can zeppelin send email by using scheduler?

2016-03-01 Thread Felix Cheung
Sounds like it could be an interesting feature to add.
Would you like to contribute? :)






On Tue, Mar 1, 2016 at 3:49 AM -0800, "魏龙星"  wrote:





In that case, users have to write code for every notebook.


Eran Witkon 于2016年3月1日周二 下午7:48写道:

> I guess that if the scheduler can run a notebook then the notebook code
> can send the mail
> Eran
> On Tue, 1 Mar 2016 at 13:38 魏龙星  wrote:
>
>> Zeppelin already support scheduler. However users can only check the
>> results on the web. The scheduler is useless since users can execute on the
>> web.  I am wondering whether zeppelin can support sending the result to
>> users like crontab.
>>
>> Any suggestions?
>>
>> Thanks.
>> Longxing
>>
>


error "Could not find creator property with name 'id' "

2016-03-01 Thread enzo
I get the following euro in a variety of circumstances.

I’ve downloaded zeppelin a couple of days ago.  I use Spark 1.6.0.


For example:

%spark

val raw = sc.textFile("/tmp/github.json”)  // reading a 25Mb file from /tmp

Gives the following error.  Hey please!!


com.fasterxml.jackson.databind.JsonMappingException: Could not find creator 
property with name 'id' (in class org.apache.spark.rdd.RDDOperationScope)
 at [Source: {"id":"0","name":"textFile"}; line: 1, column: 1]
at 
com.fasterxml.jackson.databind.JsonMappingException.from(JsonMappingException.java:148)
at 
com.fasterxml.jackson.databind.DeserializationContext.mappingException(DeserializationContext.java:843)
at 
com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.addBeanProps(BeanDeserializerFactory.java:533)
at 
com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.buildBeanDeserializer(BeanDeserializerFactory.java:220)
at 
com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.createBeanDeserializer(BeanDeserializerFactory.java:143)
at 
com.fasterxml.jackson.databind.deser.DeserializerCache._createDeserializer2(DeserializerCache.java:409)
at 
com.fasterxml.jackson.databind.deser.DeserializerCache._createDeserializer(DeserializerCache.java:358)
at 
com.fasterxml.jackson.databind.deser.DeserializerCache._createAndCache2(DeserializerCache.java:265)
at 
com.fasterxml.jackson.databind.deser.DeserializerCache._createAndCacheValueDeserializer(DeserializerCache.java:245)
at 
com.fasterxml.jackson.databind.deser.DeserializerCache.findValueDeserializer(DeserializerCache.java:143)
at 
com.fasterxml.jackson.databind.DeserializationContext.findRootValueDeserializer(DeserializationContext.java:439)
at 
com.fasterxml.jackson.databind.ObjectMapper._findRootDeserializer(ObjectMapper.java:3666)
at 
com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3558)
at 
com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2578)
at 
org.apache.spark.rdd.RDDOperationScope$.fromJson(RDDOperationScope.scala:85)
at 
org.apache.spark.rdd.RDDOperationScope$$anonfun$5.apply(RDDOperationScope.scala:136)
at 
org.apache.spark.rdd.RDDOperationScope$$anonfun$5.apply(RDDOperationScope.scala:136)
at scala.Option.map(Option.scala:145)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:136)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
at org.apache.spark.SparkContext.withScope(SparkContext.scala:714)
at org.apache.spark.SparkContext.hadoopFile(SparkContext.scala:1011)
at 
org.apache.spark.SparkContext$$anonfun$textFile$1.apply(SparkContext.scala:832)
at 
org.apache.spark.SparkContext$$anonfun$textFile$1.apply(SparkContext.scala:830)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
at org.apache.spark.SparkContext.withScope(SparkContext.scala:714)
at org.apache.spark.SparkContext.textFile(SparkContext.scala:830)
at 
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:38)
at 
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:43)
at 
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:45)
at 
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:47)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:49)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:51)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:53)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:55)
at $iwC$$iwC$$iwC$$iwC$$iwC.(:57)
at $iwC$$iwC$$iwC$$iwC.(:59)
at $iwC$$iwC$$iwC.(:61)
at $iwC$$iwC.(:63)
at $iwC.(:65)
at (:67)
at .(:71)
at .()
at .(:7)
at .()
at $print()
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
at 
org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)
at 
org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
at 
org.apache.zeppelin.spark.SparkInterpreter.interpretInput(SparkInterpreter.java:780)
at 
org.apache.zeppelin.spark.SparkInterpreter.interpret(SparkInterpreter.java:744)
at 
org.apache.zeppelin.spark.Spark

Re: Math formula on %md model

2016-03-01 Thread Aish Fenton
+1 that'd be amazing.
On Mon, Feb 29, 2016 at 6:04 PM Trevor Grant 
wrote:

> +1 for math formula support in the mark down interpreter,
>
> Trevor Grant
> Data Scientist
> https://github.com/rawkintrevo
> http://stackexchange.com/users/3002022/rawkintrevo
> http://trevorgrant.org
>
> *"Fortunate is he, who is able to know the causes of things."  -Virgil*
>
>
> On Mon, Feb 29, 2016 at 7:57 PM, Jun Chen  wrote:
>
>> Hi,
>>
>> I found both Jupyter and Gitbook(use katex or mathjax) support math
>> formula on markdown model, so how about zeppelin? The $$ math formula $$ is
>> not work in zeppelin.
>>
>> BR
>> mufeng
>>
>
>


RE: problem with start H2OContent

2016-03-01 Thread Silvio Fiorito
Make sure the quotes are around the value, not the whole variable in 
zeppelin-env.sh:

export SPARK_HOME=”/path_with_spark_1.5”

export SPARK_SUBMIT_OPTIONS=”--packages ai.h2o:sparkling-water-core_2.10:1.5.10”



From: Aleksandr Modestov
Sent: Tuesday, March 1, 2016 9:15 AM
To: 
users@zeppelin.incubator.apache.org
Subject: Re: problem with start H2OContent

Zepellin doesn't work with external Apache Spark, but I can launch Spark with 
h2o package from shell.
I use a string export "SPARK_HOME=/path_with_spark_1.5" in zeppelin-env.sh.
But I'm not sure that zeppelin sees the extermal interpreter.

On Mon, Feb 29, 2016 at 7:35 PM, Silvio Fiorito 
mailto:silvio.fior...@granturing.com>> wrote:

Can you try running it from just a Spark shell to confirm it works that way (no 
other conflict)?

bin/spark-shell --master local[*] --packages 
ai.h2o:sparkling-water-core_2.10:1.5.10

Also, are you able to run the Spark interpreter without the h2o package?

Thanks,
Silvio

From: Aleksandr Modestov
Sent: Monday, February 29, 2016 11:30 AM
To: 
users@zeppelin.incubator.apache.org
Subject: Re: problem with start H2OContent

I use Spark 1.5
The problem with the external Spark with internal Spark I can not launch 
h2oContent :)
The error is:

"ERROR [2016-02-29 19:28:16,609] ({pool-1-thread-3} 
NotebookServer.java[afterStatusChange]:766) - Error
org.apache.zeppelin.interpreter.InterpreterException: 
org.apache.zeppelin.interpreter.InterpreterException: 
org.apache.thrift.transport.TTransportException: java.net.ConnectException: 
Connection refused
at 
org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getFormType(RemoteInterpreter.java:268)
at 
org.apache.zeppelin.interpreter.LazyOpenInterpreter.getFormType(LazyOpenInterpreter.java:104)
at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:198)
at org.apache.zeppelin.scheduler.Job.run(Job.java:169)
at 
org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:322)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.zeppelin.interpreter.InterpreterException: 
org.apache.thrift.transport.TTransportException: java.net.ConnectException: 
Connection refused
at 
org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:53)
at 
org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:37)
at 
org.apache.commons.pool2.BasePooledObjectFactory.makeObject(BasePooledObjectFactory.java:60)
at 
org.apache.commons.pool2.impl.GenericObjectPool.create(GenericObjectPool.java:861)
at 
org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:435)
at 
org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:363)
at 
org.apache.zeppelin.interpreter.remote.RemoteInterpreterProcess.getClient(RemoteInterpreterProcess.java:139)
at 
org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getFormType(RemoteInterpreter.java:266)
... 11 more
Caused by: org.apache.thrift.transport.TTransportException: 
java.net.ConnectException: Connection refused
at org.apache.thrift.transport.TSocket.open(TSocket.java:187)
at 
org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:51)
... 18 more
Caused by: java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at 
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at org.apache.thrift.transport.TSocket.open(TSocket.java:182)
... 19 more"

On Mon, Feb 29, 2016 at 7:07 PM, Silvio Fiorito 
mailto:silvio.fior...@granturing.com>> wrote:
In your zeppelin-env you set SPARK_HOME and SPARK_SUBMIT_OPTIONS ? Anything in 
the logs? Looks like the interpreter failed to start.

Also, Sparkling Water currently supports up to 1.5 only, last I checked.

Thanks,
Silvio



From: Aleksandr Modestov
Sent: Monday, February 29, 2016 10:43 AM
To: 
users@zeppelin.incubator.apache.org
Subject: Re: problem with start H2OCon

Re: problem with start H2OContent

2016-03-01 Thread Aleksandr Modestov
Zepellin doesn't work with external Apache Spark, but I can launch Spark
with h2o package from shell.
I use a string export "SPARK_HOME=/path_with_spark_1.5" in zeppelin-env.sh.
But I'm not sure that zeppelin sees the extermal interpreter.

On Mon, Feb 29, 2016 at 7:35 PM, Silvio Fiorito <
silvio.fior...@granturing.com> wrote:

>
>
> Can you try running it from just a Spark shell to confirm it works that
> way (no other conflict)?
>
>
>
> bin/spark-shell --master local[*] --packages
> ai.h2o:sparkling-water-core_2.10:1.5.10
>
>
>
> Also, are you able to run the Spark interpreter without the h2o package?
>
>
>
> Thanks,
>
> Silvio
>
>
>
> *From: *Aleksandr Modestov 
> *Sent: *Monday, February 29, 2016 11:30 AM
> *To: *users@zeppelin.incubator.apache.org
> *Subject: *Re: problem with start H2OContent
>
>
> I use Spark 1.5
> The problem with the external Spark with internal Spark I can not launch
> h2oContent :)
> The error is:
>
> "ERROR [2016-02-29 19:28:16,609] ({pool-1-thread-3}
> NotebookServer.java[afterStatusChange]:766) - Error
> org.apache.zeppelin.interpreter.InterpreterException:
> org.apache.zeppelin.interpreter.InterpreterException:
> org.apache.thrift.transport.TTransportException: java.net.ConnectException:
> Connection refused
> at
> org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getFormType(RemoteInterpreter.java:268)
> at
> org.apache.zeppelin.interpreter.LazyOpenInterpreter.getFormType(LazyOpenInterpreter.java:104)
> at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:198)
> at org.apache.zeppelin.scheduler.Job.run(Job.java:169)
> at
> org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:322)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.zeppelin.interpreter.InterpreterException:
> org.apache.thrift.transport.TTransportException: java.net.ConnectException:
> Connection refused
> at
> org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:53)
> at
> org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:37)
> at
> org.apache.commons.pool2.BasePooledObjectFactory.makeObject(BasePooledObjectFactory.java:60)
> at
> org.apache.commons.pool2.impl.GenericObjectPool.create(GenericObjectPool.java:861)
> at
> org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:435)
> at
> org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:363)
> at
> org.apache.zeppelin.interpreter.remote.RemoteInterpreterProcess.getClient(RemoteInterpreterProcess.java:139)
> at
> org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getFormType(RemoteInterpreter.java:266)
> ... 11 more
> Caused by: org.apache.thrift.transport.TTransportException:
> java.net.ConnectException: Connection refused
> at org.apache.thrift.transport.TSocket.open(TSocket.java:187)
> at
> org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:51)
> ... 18 more
> Caused by: java.net.ConnectException: Connection refused
> at java.net.PlainSocketImpl.socketConnect(Native Method)
> at
> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
> at
> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
> at
> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
> at java.net.Socket.connect(Socket.java:579)
> at org.apache.thrift.transport.TSocket.open(TSocket.java:182)
> ... 19 more"
>
> On Mon, Feb 29, 2016 at 7:07 PM, Silvio Fiorito <
> silvio.fior...@granturing.com> wrote:
>
>> In your zeppelin-env you set SPARK_HOME and SPARK_SUBMIT_OPTIONS ?
>> Anything in the logs? Looks like the interpreter failed to start.
>>
>>
>>
>> Also, Sparkling Water currently supports up to 1.5 only, last I checked.
>>
>>
>>
>> Thanks,
>>
>> Silvio
>>
>>
>>
>>
>>
>>
>>
>> *From: *Aleksandr Modestov 
>> *Sent: *Monday, February 29, 2016 10:43 AM
>> *To: *users@zeppelin.incubator.apache.org
>> *Subject: *Re: problem with start H2OContent
>>
>>
>> When I use external Spark I get exeption:
>>
>> java.net.ConnectException: Connection refused at
>> java.net.PlainSocketImpl.socketConnect(Native Method) at
>> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
>> at
>> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
>> at
>> java.net.AbstractP

Re: Can zeppelin send email by using scheduler?

2016-03-01 Thread 魏龙星
In that case, users have to write code for every notebook.


Eran Witkon 于2016年3月1日周二 下午7:48写道:

> I guess that if the scheduler can run a notebook then the notebook code
> can send the mail
> Eran
> On Tue, 1 Mar 2016 at 13:38 魏龙星  wrote:
>
>> Zeppelin already support scheduler. However users can only check the
>> results on the web. The scheduler is useless since users can execute on the
>> web.  I am wondering whether zeppelin can support sending the result to
>> users like crontab.
>>
>> Any suggestions?
>>
>> Thanks.
>> Longxing
>>
>


Re: Can zeppelin send email by using scheduler?

2016-03-01 Thread Eran Witkon
I guess that if the scheduler can run a notebook then the notebook code can
send the mail
Eran
On Tue, 1 Mar 2016 at 13:38 魏龙星  wrote:

> Zeppelin already support scheduler. However users can only check the
> results on the web. The scheduler is useless since users can execute on the
> web.  I am wondering whether zeppelin can support sending the result to
> users like crontab.
>
> Any suggestions?
>
> Thanks.
> Longxing
>


Can zeppelin send email by using scheduler?

2016-03-01 Thread 魏龙星
Zeppelin already support scheduler. However users can only check the
results on the web. The scheduler is useless since users can execute on the
web.  I am wondering whether zeppelin can support sending the result to
users like crontab.

Any suggestions?

Thanks.
Longxing


Re: [DISCUSS] Update Roadmap

2016-03-01 Thread Shabeel Syed
+1

Hi Tamas,
   Pluggable external visualization is really a GREAT feature to have. I'm
looking forward to this :)

Regards
Shabeel

On Tue, Mar 1, 2016 at 2:16 PM, Tamas Szuromi 
wrote:

> Hey,
>
> Really promising roadmap.
>
> I'd only push more visualization options. I agree built in visualization
> is needed with limited charting options but I think we also need somehow
> 'inject' external js visualizations also.
>
>
> For scheduling Zeppelin notebooks  we use
>  https://github.com/airbnb/airflow  through
> the job rest api. It's an enterprise ready and very robust solution right
> now.
>
>
> *Tamas*
>
> On 1 March 2016 at 09:12, Eran Witkon  wrote:
>
>> One point to clarify, I don't want to suggest Oozie in specific, I want
>> to think about which features we develop and which ones we integrate
>> external, preferred Apache, technology? We don't think about building our
>> own storage services so why build our own scheduler?
>> Eran
>> On Tue, 1 Mar 2016 at 09:49 moon soo Lee  wrote:
>>
>>> @Vinayak, @Eran, @Benjamin, @Guilherme, @Sourav, @Rick
>>> Now I can see a lot of demands around enterprise level job scheduling.
>>> Either external or built-in, I completely agree having enterprise level job
>>> scheduling support on the roadmap.
>>> ZEPPELIN-137 ,
>>> ZEPPELIN-531  are
>>> related issues i can find in our JIRA.
>>>
>>> @Vinayak
>>> Regarding importing notebook from github, Zeppelin has pluggable
>>> notebook storage layer (see related package
>>> ).
>>> So, github notebook sync can be implemented easily.
>>>
>>> @Shabeel
>>> Right, we need better manage management to prevent such OOM.
>>> And i think table is one of the most frequently used way of displaying
>>> data. So definitely, we'll need more features like filter, sort, etc.
>>> After this roadmap discussion, discussion for the next release will
>>> follow. Then we'll get idea when those features will be available.
>>>
>>> @Prasad
>>> Thanks for mentioning HA and DR. They're really important subject for
>>> enterprise use. Definitely Zeppelin will need to address them.
>>> And displaying meta information of notebook on top level page is good
>>> idea.
>>>
>>> It's really great to hear many opinions and ideas.
>>> And thanks @Rick for sharing valuable view to Zeppelin project.
>>>
>>> Thanks,
>>> moon
>>>
>>>
>>> On Mon, Feb 29, 2016 at 11:14 PM Rick Moritz  wrote:
>>>
 Hi,

 For one, I know that there is rudimentary scheduling built into
 Zeppelin already (at least I fixed a bug in the test for a scheduling
 feature a few months ago).
 But another point is, that Zeppelin should also focus on quality,
 reproduceability and portability.
 Although this doesn't offer exciting new features, it would make
 development much easier.

 Cross-platform testability, Tests that pass when run sequentially,
 compatibility with Firefox, and many more open issues that make it so much
 harder to enhance Zeppelin and add features should be addressed soon,
 preferably before more features are added. Already Zeppelin is suffering -
 in my opinion - from quite a lot of feature creep, and we should avoid
 putting in the kitchen sink, at the cost of quality and maintainability.
 Instead modularity (ZEPPELIN-533 in particular) should be targeted.

 Oozie, in my opinion, is a dead end - it may de-facto still be in use
 on many clusters, but it's not getting the love it needs, and I wouldn't
 bet on it, when it comes to integrating scheduling. Instead, any external
 tool should be able to use the REST-API to trigger executions, if you want
 external scheduling.

 So, in conclusion, if we take Moon's list as a list of descending
 priorities, I fully agree, under the condition that code quality is
 included as a subset of enterprise-readyness. Auth* is paramount (Kerberos
 SPNEGO SSO support is what we really want) with user and group rights
 assignment on the notebook level. We probably also need Knox-integration
 (ODP-Members looking at integrating Zeppelin should consider contributing
 this), and integration of something like Spree (
 https://github.com/hammerlab/spree) to be able to profile jobs.

 I'm hopeful that soon I can resume contributing some quality-oriented
 code, to drive this "necessary evil" forward ;)

 On Mon, Feb 29, 2016 at 8:27 PM, Sourav Mazumder <
 sourav.mazumde...@gmail.com> wrote:

> I do agree with Vinayak. It need not be coupled with Oozie.
>
> Rather one should be able to call it from any scheduler typically used
> in enterprise level. May be support for BPML.
>
> I believe the existing ability to call/e

Zeppelin Real Time usecases

2016-03-01 Thread Shabeel Syed
Hi All,

   I'm planning to give a demo tomorrow to my team here on Zeppelin.

   It would be great, if I can get a list of some real time usecases with
Zeppelin by prominent companies .

Regards,
Shabeel


Re: [DISCUSS] Update Roadmap

2016-03-01 Thread Tamas Szuromi
Hey,

Really promising roadmap.

I'd only push more visualization options. I agree built in visualization is
needed with limited charting options but I think we also need somehow
'inject' external js visualizations also.


For scheduling Zeppelin notebooks  we use https://github.com/airbnb/airflow
 through the job rest api. It's an
enterprise ready and very robust solution right now.


*Tamas*

On 1 March 2016 at 09:12, Eran Witkon  wrote:

> One point to clarify, I don't want to suggest Oozie in specific, I want to
> think about which features we develop and which ones we integrate external,
> preferred Apache, technology? We don't think about building our own storage
> services so why build our own scheduler?
> Eran
> On Tue, 1 Mar 2016 at 09:49 moon soo Lee  wrote:
>
>> @Vinayak, @Eran, @Benjamin, @Guilherme, @Sourav, @Rick
>> Now I can see a lot of demands around enterprise level job scheduling.
>> Either external or built-in, I completely agree having enterprise level job
>> scheduling support on the roadmap.
>> ZEPPELIN-137 ,
>> ZEPPELIN-531  are
>> related issues i can find in our JIRA.
>>
>> @Vinayak
>> Regarding importing notebook from github, Zeppelin has pluggable notebook
>> storage layer (see related package
>> ).
>> So, github notebook sync can be implemented easily.
>>
>> @Shabeel
>> Right, we need better manage management to prevent such OOM.
>> And i think table is one of the most frequently used way of displaying
>> data. So definitely, we'll need more features like filter, sort, etc.
>> After this roadmap discussion, discussion for the next release will
>> follow. Then we'll get idea when those features will be available.
>>
>> @Prasad
>> Thanks for mentioning HA and DR. They're really important subject for
>> enterprise use. Definitely Zeppelin will need to address them.
>> And displaying meta information of notebook on top level page is good
>> idea.
>>
>> It's really great to hear many opinions and ideas.
>> And thanks @Rick for sharing valuable view to Zeppelin project.
>>
>> Thanks,
>> moon
>>
>>
>> On Mon, Feb 29, 2016 at 11:14 PM Rick Moritz  wrote:
>>
>>> Hi,
>>>
>>> For one, I know that there is rudimentary scheduling built into Zeppelin
>>> already (at least I fixed a bug in the test for a scheduling feature a few
>>> months ago).
>>> But another point is, that Zeppelin should also focus on quality,
>>> reproduceability and portability.
>>> Although this doesn't offer exciting new features, it would make
>>> development much easier.
>>>
>>> Cross-platform testability, Tests that pass when run sequentially,
>>> compatibility with Firefox, and many more open issues that make it so much
>>> harder to enhance Zeppelin and add features should be addressed soon,
>>> preferably before more features are added. Already Zeppelin is suffering -
>>> in my opinion - from quite a lot of feature creep, and we should avoid
>>> putting in the kitchen sink, at the cost of quality and maintainability.
>>> Instead modularity (ZEPPELIN-533 in particular) should be targeted.
>>>
>>> Oozie, in my opinion, is a dead end - it may de-facto still be in use on
>>> many clusters, but it's not getting the love it needs, and I wouldn't bet
>>> on it, when it comes to integrating scheduling. Instead, any external tool
>>> should be able to use the REST-API to trigger executions, if you want
>>> external scheduling.
>>>
>>> So, in conclusion, if we take Moon's list as a list of descending
>>> priorities, I fully agree, under the condition that code quality is
>>> included as a subset of enterprise-readyness. Auth* is paramount (Kerberos
>>> SPNEGO SSO support is what we really want) with user and group rights
>>> assignment on the notebook level. We probably also need Knox-integration
>>> (ODP-Members looking at integrating Zeppelin should consider contributing
>>> this), and integration of something like Spree (
>>> https://github.com/hammerlab/spree) to be able to profile jobs.
>>>
>>> I'm hopeful that soon I can resume contributing some quality-oriented
>>> code, to drive this "necessary evil" forward ;)
>>>
>>> On Mon, Feb 29, 2016 at 8:27 PM, Sourav Mazumder <
>>> sourav.mazumde...@gmail.com> wrote:
>>>
 I do agree with Vinayak. It need not be coupled with Oozie.

 Rather one should be able to call it from any scheduler typically used
 in enterprise level. May be support for BPML.

 I believe the existing ability to call/execute a Zeppelin Notebook or a
 specific paragraph within a notebook using REST API should take care of
 this requirement to some extent.

 Regards,
 Sourav

 On Mon, Feb 29, 2016 at 11:23 AM, Vinayak Agrawal <
 vinayakagrawa...@gmail.com> wrote:

> @Eran Witkon,
>>>

Re: [DISCUSS] Update Roadmap

2016-03-01 Thread Eran Witkon
One point to clarify, I don't want to suggest Oozie in specific, I want to
think about which features we develop and which ones we integrate external,
preferred Apache, technology? We don't think about building our own storage
services so why build our own scheduler?
Eran
On Tue, 1 Mar 2016 at 09:49 moon soo Lee  wrote:

> @Vinayak, @Eran, @Benjamin, @Guilherme, @Sourav, @Rick
> Now I can see a lot of demands around enterprise level job scheduling.
> Either external or built-in, I completely agree having enterprise level job
> scheduling support on the roadmap.
> ZEPPELIN-137 ,
> ZEPPELIN-531  are
> related issues i can find in our JIRA.
>
> @Vinayak
> Regarding importing notebook from github, Zeppelin has pluggable notebook
> storage layer (see related package
> ).
> So, github notebook sync can be implemented easily.
>
> @Shabeel
> Right, we need better manage management to prevent such OOM.
> And i think table is one of the most frequently used way of displaying
> data. So definitely, we'll need more features like filter, sort, etc.
> After this roadmap discussion, discussion for the next release will
> follow. Then we'll get idea when those features will be available.
>
> @Prasad
> Thanks for mentioning HA and DR. They're really important subject for
> enterprise use. Definitely Zeppelin will need to address them.
> And displaying meta information of notebook on top level page is good idea.
>
> It's really great to hear many opinions and ideas.
> And thanks @Rick for sharing valuable view to Zeppelin project.
>
> Thanks,
> moon
>
>
> On Mon, Feb 29, 2016 at 11:14 PM Rick Moritz  wrote:
>
>> Hi,
>>
>> For one, I know that there is rudimentary scheduling built into Zeppelin
>> already (at least I fixed a bug in the test for a scheduling feature a few
>> months ago).
>> But another point is, that Zeppelin should also focus on quality,
>> reproduceability and portability.
>> Although this doesn't offer exciting new features, it would make
>> development much easier.
>>
>> Cross-platform testability, Tests that pass when run sequentially,
>> compatibility with Firefox, and many more open issues that make it so much
>> harder to enhance Zeppelin and add features should be addressed soon,
>> preferably before more features are added. Already Zeppelin is suffering -
>> in my opinion - from quite a lot of feature creep, and we should avoid
>> putting in the kitchen sink, at the cost of quality and maintainability.
>> Instead modularity (ZEPPELIN-533 in particular) should be targeted.
>>
>> Oozie, in my opinion, is a dead end - it may de-facto still be in use on
>> many clusters, but it's not getting the love it needs, and I wouldn't bet
>> on it, when it comes to integrating scheduling. Instead, any external tool
>> should be able to use the REST-API to trigger executions, if you want
>> external scheduling.
>>
>> So, in conclusion, if we take Moon's list as a list of descending
>> priorities, I fully agree, under the condition that code quality is
>> included as a subset of enterprise-readyness. Auth* is paramount (Kerberos
>> SPNEGO SSO support is what we really want) with user and group rights
>> assignment on the notebook level. We probably also need Knox-integration
>> (ODP-Members looking at integrating Zeppelin should consider contributing
>> this), and integration of something like Spree (
>> https://github.com/hammerlab/spree) to be able to profile jobs.
>>
>> I'm hopeful that soon I can resume contributing some quality-oriented
>> code, to drive this "necessary evil" forward ;)
>>
>> On Mon, Feb 29, 2016 at 8:27 PM, Sourav Mazumder <
>> sourav.mazumde...@gmail.com> wrote:
>>
>>> I do agree with Vinayak. It need not be coupled with Oozie.
>>>
>>> Rather one should be able to call it from any scheduler typically used
>>> in enterprise level. May be support for BPML.
>>>
>>> I believe the existing ability to call/execute a Zeppelin Notebook or a
>>> specific paragraph within a notebook using REST API should take care of
>>> this requirement to some extent.
>>>
>>> Regards,
>>> Sourav
>>>
>>> On Mon, Feb 29, 2016 at 11:23 AM, Vinayak Agrawal <
>>> vinayakagrawa...@gmail.com> wrote:
>>>
 @Eran Witkon,
 Thanks for the suggestion Eran. I concur with your thought.
 If Zepplin can be integrated with oozie, that would be wonderful. Users
 will also be able to leverage their Oozie skills.
 This would be promising for now.
 However, in the future Hadoop might not necessarily be installed in
 Spark Cluster and Oozie (since its installs with Hadoop Distribution) might
 not be available.
 So perhaps we should give a thought about this feature for the future.
 Should it depend on oozie or should Zeppelin have its owns scheduling?

 As Benjamin has