Re: Multi-line scripts in spark interpreter

2018-07-12 Thread Sanjay Dasgupta
Jeff Zhang's comment here

may be useful.

Regards,
Sanjay

On Fri, Jul 13, 2018 at 1:01 AM, Paul Brenner  wrote:

> This behavior is coming from the new spark interpreter. Jeff opened 
> ZEPPELIN-3587
> to fix it. In the mean time you can use the old spark interpreter (set 
> zeppelin.spark.useNew
> to false) to get around this. Hopefully you aren't dependent on the new
> spark interpreter.
>
>
> 
> 
> 
>  *Paul
> Brenner*
> 
> 
> 
> 
> 
> 
> 
> SR. DATA SCIENTIST
> *(217) 390-3033 *
>
> 
> 
> 
> 
> 
> 

Re: [DISCUSS] Share Data in Zeppelin

2018-07-12 Thread Sanjay Dasgupta
I prefer 2.b also. Could we use (save*Result*AsTable=people) instead?

There are a few typos in the example note shared:

1) The line val peopleDF = spark.read.format("zeppelin").load() should
mention the table name (possibly as argument to load?)
2) The python line val peopleDF = z.getTable("people").toPandas() should
not have the val


The z.getTable() method could be a very good tool to judge
which use-cases are important in the community. It is easy to implement for
the in-memory data case, and could be very useful for many situations where
a small amount of data is being transferred across interpreters (like the
jdbc -> matplotlib case mentioned).

Thanks,
Sanjay

On Fri, Jul 13, 2018 at 8:07 AM, Jongyoul Lee  wrote:

> Yes, it's similar to 2.b.
>
> Basically, my concern is to handle all kinds of data. But in your case, it
> looks like focusing on table data. It's also useful but it would be better
> to handle all of the data including table or plain text as well. WDYT?
>
> About storage, we could discuss it later.
>
> On Fri, Jul 13, 2018 at 11:25 AM, Jeff Zhang  wrote:
>
>>
>> I think your use case is the same of 2.b.  Personally I don't recommend
>> to use z.get(noteId, paragraphId) to get the shared data for 2 reasons
>> 1.  noteId, paragraphId is meaningless, which is not readable
>> 2. The note will break if we clone it as the noteId is changed.
>> That's why I suggest to use paragraph property to save paragraph's result
>>
>> Regarding the intermediate storage, I also though about it and agree that
>> in the long term we should provide such layer to support large data,
>> currently we put the shared data in memory which is not a scalable
>> solution.  One candidate in my mind is alluxio [1], and regarding the data
>> format I think apache arrow [2] is another good option for zeppelin to
>> share table data across interpreter processes and different languages. But
>> these are all implementation details, I think we can talk about them in
>> another thread. In this thread, I think we should focus on the user facing
>> api.
>>
>>
>> [1] http://www.alluxio.org/
>> [2] https://arrow.apache.org/
>>
>>
>>
>> Jongyoul Lee 于2018年7月13日周五 上午10:11写道:
>>
>>> I have a bit different idea to share data.
>>>
>>> In my case,
>>>
>>> It would be very useful to get a paragraph's result as an input of other
>>> paragraphs.
>>>
>>> e.g.
>>>
>>> -- Paragrph 1
>>> %jdbc
>>> select * from some_table;
>>>
>>> -- Paragraph 2
>>> %spark
>>> val rdd = z.get("noteId", "paragraphId").parse.makeRddByMyself
>>> spark.read(table).select
>>>
>>> If paragraph 1's result is too big to show on FE, it would be saved in
>>> Zeppelin Server with proper way and pass to SparkInterpreter when Paragraph
>>> 2 is executed.
>>>
>>> Basically, I think we need to intermediate storage to store paragraph's
>>> results to share them. We can introduce another layer or extend
>>> NotebootRepo. In some cases, we might change notebook repos as well.
>>>
>>> JL
>>>
>>>
>>>
>>> On Fri, Jul 13, 2018 at 10:39 AM, Jeff Zhang  wrote:
>>>
 Hi Folks,

 Recently, there's several tickets [1][2][3] about sharing data in
 zeppelin.
 Zeppelin's goal is to be an unified data analyst platform which could
 integrate most of the big data tools and help user to switch between
 tools
 and share data between tools easily. So sharing data is a very critical
 and
 killer feature of Zeppelin IMHO.

 I raise this ticket to discuss about the scenario of sharing data and
 how
 to do that. Although zeppelin already provides tools and api to share
 data,
 I don't think it is mature and stable enough. After seeing these
 tickets, I
 think it might be a good time to talk about it in community and gather
 more
 feedback, so that we could provide a more stable and mature approach for
 it.

 Currently, there're 3 approaches to share data between interpreters and
 interpreter processes.
 1. Sharing data across interpreter in the same interpreter process. Like
 sharing data via the same SparkContext in %spark, %spark.pyspark and
 %spark.r.
 2. Sharing data between frontend and backend via angularObject
 3. Sharing data across interpreter processes via Zeppelin's ResourcePool

 For this thread, I would like to talk about the approach 3 (Sharing data
 via Zeppelin's ResourcePool)

 Here's my current thinking of sharing data.
 1. What kind of data would be shared ?
IMHO, users would share 2 kinds of data: primitive data (string,
 number)
 and table data.

 2. How to write shared data ?
 User may want to share data via 2 approches
 a. Use ZeppelinContext (e.g. z.put).
 b. Share the paragraph result via paragraph properties. e.g. user
 may
 want to read data from oracle database via jdbc interpreter and then do
 plotting in python interpreter. In such scenario. he can save the jdbc
 

Re: [DISCUSS] Is interpreter binding necessary ?

2018-07-06 Thread Sanjay Dasgupta
If there is no per-interpreter overhead of binding all the interpreters
from the beginning, we should definitely do it. This will simplify the GUI
somewhat.

Regards,
- Sanjay


On Fri, Jul 6, 2018 at 1:49 PM, Partridge, Lucas (GE Aviation) <
lucas.partri...@ge.com> wrote:

> “So usually we would recommend user to specify the full qualified
> interpreter name.”
>
> - I usually recommend the exact opposite to our users. We frequently
> change interpreter groups to allow for different Spark cluster settings
> (number of executors, memory, etc). Users with more demanding requirements
> are asked to use custom interpreter groups with more allocated resources.
> If users included the interpreter group name at the start of every
> paragraph they would then have to manually edit the start of every
> paragraph before they could run their note using a different interpreter
> group. Very tedious!
>
>
>
> But I agree the short names without the interpreter group are often
> ambiguous and can cause confusion.  Maybe somewhere in the execution output
> of each paragraph there should be some discrete text giving the fully
> qualified name of the interpreter that was actually used to produce that
> output. Or a clearly defined ‘default interpreter group’ text in the
> toolbar at the top of each notebook. Make it a dropdown so it would be easy
> to change the default.
>
>
>
> *From:* Jeff Zhang 
> *Sent:* 06 July 2018 08:53
> *To:* users@zeppelin.apache.org
> *Cc:* dev 
> *Subject:* EXT: Re: [DISCUSS] Is interpreter binding necessary ?
>
>
>
>
>
> We already allow setting default interpreter when creating note. Another
> way to set default interpreter is to reorder the interpreter setting
> binding in note page.
>
>
>
> But personally I don't recommend user to use short interpreter name
> because of default interpreter. 2 Reaons:
>
> 1. It introduce in-accurate info. e.g. In our product, we have 2 spark
> interpreters (`spark`: for spark 1.x & `spark2` for spark 2.x).  Then user
> often specify `%spark` for spark interpreter. But it could mean both
> `%spark.spark`  and `%spark2.spark`, So usually it is very hard to tell
> what's wrong when user expect to work spark2 but actually he still use
> spark 1.x. So usually we would recommend user to specify the full qualified
> interpreter name. Just type several more characters which just cost 2
> seconds but make it more clear and readable.
>
> 2. Another issue is that interpreter binding is stored in
> interpreter.json, that means if they export this note to another zeppelin
> instance, the default interpreter won't work.
>
>
>
> So I don't think setting default interpreter via interpreter binding is
> valuable for users. If user really want to do that, I would suggest to
> store it in note.json instead of interpreter.json
>
>
>
>
>
> Jongyoul Lee 于2018年7月6日周五 下午3:36写道:
>
> There are two purposes of interpreter binding. One is what you mentioned
> and another one is to manage a default interpreter. If we provide a new way
> to set default interpreter, I think we can remove them :-) We could set
> permissions in other ways.
>
>
>
> Overall, +1
>
>
>
> On Fri, Jul 6, 2018 at 4:24 PM, Jeff Zhang  wrote:
>
> Hi Folks,
>
>
>
> I raise this thread to discuss whether we need the interpreter binding.
> Currently when user create notes, they have to bind interpreters to their
> notes in note page. Otherwise they will hit interpreter not found issue.
> Besides that in zeppelin server side, we maintain the interpreter binding
> info in memory as well as in interpreter.json.
>
>
>
> IMHO, it is not necessary to do interpreter binding. Because it just add
> extra burden to maintain the interpreter binding info in zeppelin server
> side, and doesn't introduce any benefits. The only benefit is that we will
> check whether user have permission to use this interpreter, but actually
> zeppelin will check the permission when running paragraph, so I don't think
> we need to introduce interpreter binding just for this kind of permission
> check that we will do later.
>
>
>
> So overall, I would suggest to remove interpreter binding feature.  What
> do you think ?
>
>
>
>
>
> --
>
> 이종열, Jongyoul Lee, 李宗烈
>
> http://madeng.net
>
>


Re: Accepting password as an input

2018-06-21 Thread Sanjay Dasgupta
Issue [ZEPPELIN-2528] Add a password text input to the ZeppelinContext
 is probably close to
this requirement.

Regards,
Sanjay

On Fri, Jun 22, 2018 at 6:12 AM, Jeff Zhang  wrote:

>
> I am afraid it is not supported yet. Would you mind to create a ticket for
> that ? And it would be nice to share your usage on that.
>
>
>
> Shirish Deshmukh 于2018年6月21日周四 下午7:17写道:
>
>> Hi,
>>
>> I am using ZeppelinContext to create a form to accept various fields from
>> a user for a notebook. Is there any way to accept password without it being
>> visible on the screen?  We can use z.input() as shown in docs but the
>> password is displayed on the screen.
>>
>> regards,
>> Shirish
>>
>>


Re: note imports broken?

2018-05-24 Thread Sanjay Dasgupta
Yes, it does work. Thanks for the info Prabhjyot Singh.


On Thu, May 24, 2018 at 12:14 PM, Prabhjyot Singh <prabhjyotsi...@gmail.com>
wrote:

> This is the same problem that I've described here (
> https://issues.apache.org/jira/browse/ZEPPELIN-3485?
> focusedCommentId=16487844=com.atlassian.jira.
> plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16487844). So,
> instead of that notebook that is String (starts with double-quotes), if you
> try the one attached the same will work.
>
> On Thu, 24 May 2018 at 12:06, Sanjay Dasgupta <sanjay.dasgu...@gmail.com>
> wrote:
>
>> I tried 0.8.0-RC2 to 0.8.0-RC2.
>>
>> Attempted to import a really small trivial notebook exported just a
>> minute before from the same zeppelin instance. More details are here
>> <https://issues.apache.org/jira/browse/ZEPPELIN-1028?focusedCommentId=16478853=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16478853>
>> .
>>
>> Thanks
>>
>> On Thu, May 24, 2018 at 9:35 AM, Jeff Zhang <zjf...@gmail.com> wrote:
>>
>>>
>>> Import from which version of zeppelin notes to 0.8 ?
>>>
>>>
>>> Sanjay Dasgupta <sanjay.dasgu...@gmail.com>于2018年5月24日周四 上午11:52写道:
>>>
>>>> I noticed this some time back too, but reported it here
>>>> <https://issues.apache.org/jira/browse/ZEPPELIN-1028?focusedCommentId=16478853=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16478853>
>>>>
>>>> On Wed, May 23, 2018 at 10:00 PM, Ruslan Dautkhanov <
>>>> dautkha...@gmail.com> wrote:
>>>>
>>>>> Was anybody able to import notes on 0.8 RC or a recent master
>>>>> snapshot?
>>>>> Notes import seems to be broken
>>>>> Filed https://issues.apache.org/jira/browse/ZEPPELIN-3485
>>>>> This looks serious to me.
>>>>>
>>>>>
>>>>> --
>>>>> Ruslan Dautkhanov
>>>>>
>>>>
>>>>
>>
>
> --
> Thankx and Regards,
>
> Prabhjyot Singh
>


Re: note imports broken?

2018-05-24 Thread Sanjay Dasgupta
I tried 0.8.0-RC2 to 0.8.0-RC2.

Attempted to import a really small trivial notebook exported just a minute
before from the same zeppelin instance. More details are here
<https://issues.apache.org/jira/browse/ZEPPELIN-1028?focusedCommentId=16478853=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16478853>
.

Thanks

On Thu, May 24, 2018 at 9:35 AM, Jeff Zhang <zjf...@gmail.com> wrote:

>
> Import from which version of zeppelin notes to 0.8 ?
>
>
> Sanjay Dasgupta <sanjay.dasgu...@gmail.com>于2018年5月24日周四 上午11:52写道:
>
>> I noticed this some time back too, but reported it here
>> <https://issues.apache.org/jira/browse/ZEPPELIN-1028?focusedCommentId=16478853=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16478853>
>>
>> On Wed, May 23, 2018 at 10:00 PM, Ruslan Dautkhanov <dautkha...@gmail.com
>> > wrote:
>>
>>> Was anybody able to import notes on 0.8 RC or a recent master snapshot?
>>> Notes import seems to be broken
>>> Filed https://issues.apache.org/jira/browse/ZEPPELIN-3485
>>> This looks serious to me.
>>>
>>>
>>> --
>>> Ruslan Dautkhanov
>>>
>>
>>


Re: note imports broken?

2018-05-23 Thread Sanjay Dasgupta
I noticed this some time back too, but reported it here


On Wed, May 23, 2018 at 10:00 PM, Ruslan Dautkhanov 
wrote:

> Was anybody able to import notes on 0.8 RC or a recent master snapshot?
> Notes import seems to be broken
> Filed https://issues.apache.org/jira/browse/ZEPPELIN-3485
> This looks serious to me.
>
>
> --
> Ruslan Dautkhanov
>


Should max-results = 0 (or max-rows = 0) mean unlimited results?

2018-05-11 Thread Sanjay Dasgupta
Many of the interpreters have a parameter named maxResults, max_no_of_rows, 
max_count, etc whose purpose is to limit the number of output rows displayed 
(for example from z.show(...)).

In most (or perhaps all) of these implementations, setting this parameter to 0 
causes no output to be displayed at all as 0 is taken literally. 

In certain other contexts, the value 0 in a configuration parameter is often 
used as a special indicator meaning "unlimited". We have at least one recent 
request for such an interpretation of the value "0" in the maximum output rows 
parameter (see https://issues.apache.org/jira/browse/ZEPPELIN-3446). 

I would like to ask the user community what they think of making such a change. 
How common would such use be? are there any downsides?

Thanks for your ideas.