Re: [PROPOSAL] Livy Proposal for Apache Incubator

2017-05-30 Thread Sean Busbey
Hi folks!

Thanks for all the feedback and on the additional mentor; I think the proposal 
is ready to go to a vote now.

Presuming folks don't have any last minute questions, I'll post it later today.

-busbey

On 2017-05-23 19:52 (-0500), Kostas Sakellis  wrote: 
> Thank you John for the feedback.
> 
> On Mon, May 22, 2017 at 12:30 PM, Sean Busbey  wrote:
> 
> > On 2017-05-21 09:46 (-0500), "John D. Ament" 
> > wrote:
> > >
> > > On Fri, May 19, 2017 at 7:45 PM Sean Busbey  wrote:
> > > >
> > > > == Reliance on salaried Developers ==
> > > >
> > > > The existing contributors to the Livy project have been made by
> > salaried
> > > > engineers from Cloudera, Microsoft and Hortonworks. Since there are
> > three
> > > > major organisations involved, the risk of reliance on a single group of
> > > > salaried developers is mitigated. The Livy user base is diverse, with
> > users
> > > > from across the globe, including users from academic settings. We aim
> > to
> > > > further diversify the Livy user and contributor base.
> > > >
> > >
> > > There's a disconnect between this paragraph and the initial committers
> > > list. Specifically, no one from Microsoft is represented (as best as I
> > can
> > > tell).
> >
> > Ah, this lack of clarity is my fault as an editor. One of the initial
> > committers was recently employed by Microsoft but is now in the
> > process of changing employers. Another person formally affiliated with
> > both the project and Microsoft has decided not to continue
> > participating.
> >
> > I could rephrase this to talk about the contributions made to date as
> > being from individuals then in the employ of three major companies.
> > Then call out the initial committer list as from two of those and one
> > unaffiliated. Would that read clearer?
> >
> 
> I modified this section in the proposal to read: "The contributions to the
> Livy project to date have been made by salaried engineers from Cloudera,
> Microsoft and Hortonworks. One of the individuals on the initial committer
> list has since left Microsoft and is currently unaffiliated. The remaining
> contributors are from Cloudera and Hortonworks. Since there are at least
> two major organizations involved, the risk of reliance on a single group of
> salaried developers is mitigated. The Livy user base is diverse, with users
> from across the globe, including users from academic settings. We aim to
> further diversify the Livy user and contributor base."
> 
> 
> >
> > > > Cloudera currently owns the domain name: http://livy.io/ which will be
> > > > transferred to the ASF and redirected to the official page during
> > > > incubation.
> > > >
> > > >
> > >
> > > I'm assuming that the incoming project is aware that we expect the main
> > dev
> > > landing page to be livy.incubator.apache.org . We will want to track
> > this
> > > as a project specific item.
> >
> >
> > Yep, once all the docs are moved over to ASF infrastructure we can
> > just have the current domain act as a redirect. Should I call this out
> > in the proposal?
> >
> > I modified the proposal to say: "Cloudera currently owns the domain name:
> http://livy.io/. Once all the documentation has moved over to ASF
> infrastructure, the main landing page will become livy.incubator.apache.org
> and the old domain will just act as a redirect."
> 
> 
> >
> > > > == Git Repository ==
> > > >
> > > > git://git.apache.org/livy
> > > >
> > >
> > > Just to confirm - it'll be incubator-livy, not just livy.
> >
> > right right. I'll correct this when making the other edits.
> >
> > Added the correction to the proposal
> 
> 
> >
> > > > == Issue Tracking ==
> > > >
> > > > We would like to import our current JIRA project into the ASF JIRA,
> > such
> > > > that our historical commit message and code comments continue to
> > reference
> > > > the appropriate bug numbers.
> > > >
> > >
> > > I would recommend reaching out to infra to see if the import is possible
> > > before voting on the project. Otherwise you'll need to list out an la
> >
> > Sure I can reach out. I've seen this done a few times, so I consider
> > it low risk. The end of your line appears to have been lost, what's
> > the "Otherwise..." ?
> >
> > > > = Sponsors =
> > > > == Champion ==
> > > >
> > > > * Sean Busbey (bus...@apache.org)
> > > >
> > > > == Nominated Mentors ==
> > > >
> > > > * Bikas Saha (bi...@apache.org)
> > > > * Brock Noland (br...@phdata.io)
> > >
> > >
> > > A couple of points:
> > >
> > > - Sean, while the champion and mentor roles are separate, we do hope that
> > > all champions will continue on as a mentor. If this is your intention
> > > please add yourself.
> >
> > After having to withdraw from mentoring a couple of podlings at the
> > end of last year I am conservative about what my volunteer time looks
> > like right now. I'm certain I can spare the time to help the Livy
> > community get introduced to the 

Re: [PROPOSAL] Livy Proposal for Apache Incubator

2017-05-23 Thread Kostas Sakellis
Thank you John for the feedback.

On Mon, May 22, 2017 at 12:30 PM, Sean Busbey  wrote:

> On 2017-05-21 09:46 (-0500), "John D. Ament" 
> wrote:
> >
> > On Fri, May 19, 2017 at 7:45 PM Sean Busbey  wrote:
> > >
> > > == Reliance on salaried Developers ==
> > >
> > > The existing contributors to the Livy project have been made by
> salaried
> > > engineers from Cloudera, Microsoft and Hortonworks. Since there are
> three
> > > major organisations involved, the risk of reliance on a single group of
> > > salaried developers is mitigated. The Livy user base is diverse, with
> users
> > > from across the globe, including users from academic settings. We aim
> to
> > > further diversify the Livy user and contributor base.
> > >
> >
> > There's a disconnect between this paragraph and the initial committers
> > list. Specifically, no one from Microsoft is represented (as best as I
> can
> > tell).
>
> Ah, this lack of clarity is my fault as an editor. One of the initial
> committers was recently employed by Microsoft but is now in the
> process of changing employers. Another person formally affiliated with
> both the project and Microsoft has decided not to continue
> participating.
>
> I could rephrase this to talk about the contributions made to date as
> being from individuals then in the employ of three major companies.
> Then call out the initial committer list as from two of those and one
> unaffiliated. Would that read clearer?
>

I modified this section in the proposal to read: "The contributions to the
Livy project to date have been made by salaried engineers from Cloudera,
Microsoft and Hortonworks. One of the individuals on the initial committer
list has since left Microsoft and is currently unaffiliated. The remaining
contributors are from Cloudera and Hortonworks. Since there are at least
two major organizations involved, the risk of reliance on a single group of
salaried developers is mitigated. The Livy user base is diverse, with users
from across the globe, including users from academic settings. We aim to
further diversify the Livy user and contributor base."


>
> > > Cloudera currently owns the domain name: http://livy.io/ which will be
> > > transferred to the ASF and redirected to the official page during
> > > incubation.
> > >
> > >
> >
> > I'm assuming that the incoming project is aware that we expect the main
> dev
> > landing page to be livy.incubator.apache.org . We will want to track
> this
> > as a project specific item.
>
>
> Yep, once all the docs are moved over to ASF infrastructure we can
> just have the current domain act as a redirect. Should I call this out
> in the proposal?
>
> I modified the proposal to say: "Cloudera currently owns the domain name:
http://livy.io/. Once all the documentation has moved over to ASF
infrastructure, the main landing page will become livy.incubator.apache.org
and the old domain will just act as a redirect."


>
> > > == Git Repository ==
> > >
> > > git://git.apache.org/livy
> > >
> >
> > Just to confirm - it'll be incubator-livy, not just livy.
>
> right right. I'll correct this when making the other edits.
>
> Added the correction to the proposal


>
> > > == Issue Tracking ==
> > >
> > > We would like to import our current JIRA project into the ASF JIRA,
> such
> > > that our historical commit message and code comments continue to
> reference
> > > the appropriate bug numbers.
> > >
> >
> > I would recommend reaching out to infra to see if the import is possible
> > before voting on the project. Otherwise you'll need to list out an la
>
> Sure I can reach out. I've seen this done a few times, so I consider
> it low risk. The end of your line appears to have been lost, what's
> the "Otherwise..." ?
>
> > > = Sponsors =
> > > == Champion ==
> > >
> > > * Sean Busbey (bus...@apache.org)
> > >
> > > == Nominated Mentors ==
> > >
> > > * Bikas Saha (bi...@apache.org)
> > > * Brock Noland (br...@phdata.io)
> >
> >
> > A couple of points:
> >
> > - Sean, while the champion and mentor roles are separate, we do hope that
> > all champions will continue on as a mentor. If this is your intention
> > please add yourself.
>
> After having to withdraw from mentoring a couple of podlings at the
> end of last year I am conservative about what my volunteer time looks
> like right now. I'm certain I can spare the time to help the Livy
> community get introduced to the incubator. I'm not certain beyond
> that, so I am not listed as a formal mentor.
>
> >  - All mentors must be on the IPMC. Foundation membership isn't a
> > requirement, however most people use membership to get access to the
> IPMC.
> > If Bikas wants to be a mentor, he'll need to join the IPMC otherwise
> you'll
> > need to find 2 mentors.
>
> I was pretty sure Bikas had already done this step. I'll chase this
> down and find the disconnect.
>
> > - Do the proposed mentors have a relationship to the incoming project,
> e.g.
> > do 

Re: [PROPOSAL] Livy Proposal for Apache Incubator

2017-05-22 Thread Kostas Sakellis
On Mon, May 22, 2017 at 1:01 AM, Luciano Resende 
wrote:

> +1
>
> Also, I see the proposal is short on mentors, so feel free to include me as
> a mentor for the project.
>
> Thanks
>

Thanks Luciano! Welcome onboard.


> On Fri, May 19, 2017 at 4:45 PM Sean Busbey  wrote:
>
> > Dear Apache Incubator Community,
> >
> > I'm excited to present for discussion a proposal to move Livy into
> > incubation. Livy is web service that exposes a REST interface for
> managing
> > long running Apache Spark contexts in your cluster. With Livy, new
> > applications can be built on top of Apache Spark that require fine
> grained
> > interaction with many Spark contexts.
> >
> > The proposal is on the wiki at the following page as well as copied in
> the
> > email below:
> >
> > https://wiki.apache.org/incubator/LivyProposal
> >
> > In addition to welcoming feedback on the proposal, we are actively
> seeking
> > one or more additional mentors. We also have included a section for
> > interested folks to ensure they get added to the mailing lists, presuming
> > Livy gets accepted for incubation.
> >
> >  LivyProposal
> >
> > = Abstract =
> >
> > Livy is web service that exposes a REST interface for managing
> > long running Apache Spark contexts in your cluster. With Livy, new
> > applications can be built on top of Apache Spark that require fine
> grained
> > interaction with many Spark contexts.
> >
> > = Proposal =
> >
> > Livy is an open-source REST service for Apache Spark. Livy
> > enables applications to submit Spark applications and retrieve results
> > without a co-location requirement on the Spark cluster.
> >
> > We propose to contribute the Livy codebase and associated artifacts (e.g.
> > documentation, web-site context etc) to the Apache Software Foundation.
> >
> > = Background =
> >
> > Apache Spark is a fast and general purpose distributed
> > compute engine, with a versatile API. It enables processing of large
> > quantities of static data distributed over a cluster of machines, as well
> > as
> > processing of continuous streams of data. It is the preferred distributed
> > data processing engine for data engineering, stream processing and data
> > science workloads. Each Spark application uses a construct called the
> > SparkContext, which is the application’s connection or entry point to the
> > Spark engine. Each Spark application will have its own SparkContext.
> >
> > Livy enables clients to interact with one or more Spark sessions through
> > the
> > Livy Server, which acts as a proxy layer. Livy Clients have fine grained
> > control over the lifecycle of the Spark sessions, as well as the ability
> to
> > submit jobs and retrieve results, all over HTTP.  Clients have two modes
> of
> > interaction: RPC Client API, available in Java and Python, which allows
> > results to be retrieved as Java or Python objects. The serialization and
> > deserialization of the results is handled by the Livy framework.  HTTP
> > based
> > API that allows submission of code snippets, and retrieval of the results
> > in
> > different formats.
> >
> > Multi-tenant resource allocation and security: Livy enables multiple
> > independent Spark sessions to be managed simultaneously. Multiple clients
> > can also interact simultaneously with the same Spark session and share
> the
> > resources of that Spark session. Livy can also enforce secure,
> > authenticated
> > communication between the clients and their respective Spark sessions.
> >
> > More information on Livy can be found at the existing open source
> website:
> > http://livy.io/
> >
> > = Rationale =
> >
> > Users want to use Spark’s powerful processing engine and API
> > as the data processing backend for interactive applications. However, the
> > job submission and application interaction mechanisms built into Apache
> > Spark are insufficient and cumbersome for multi-user interactive
> > applications.
> >
> > The primary mechanism for applications to submit Spark jobs is via
> > spark-submit
> > (http://spark.apache.org/docs/latest/submitting-applications.html),
> which
> > is
> > available as a command line tool as well as a programmatic API. However,
> > spark-submit has the following limitations that make it difficult to
> build
> > interactive applications: It is slow: each invocation of spark-submit
> > involves a setup phase where cluster resources are acquired, new
> processes
> > are forked, etc. This setup phase runs for many seconds, or even minutes,
> > and hence is too slow for interactive applications.  It is cumbersome and
> > lacks flexibility: application code and dependencies have to be
> > pre-compiled
> > and submitted as jars, and can not be submitted interactively.
> >
> > Apache Spark comes with an ODBC/JDBC server, which can be used to submit
> > SQL
> > queries to Spark. However, this solution is limited to SQL and does not
> > allow the client to leverage the rest of the Spark API, such as RDDs,

Re: [PROPOSAL] Livy Proposal for Apache Incubator

2017-05-22 Thread John D. Ament
On Mon, May 22, 2017 at 3:30 PM Sean Busbey  wrote:

> On 2017-05-21 09:46 (-0500), "John D. Ament" 
> wrote:
> >
> > On Fri, May 19, 2017 at 7:45 PM Sean Busbey  wrote:
> > >
> > > == Reliance on salaried Developers ==
> > >
> > > The existing contributors to the Livy project have been made by
> salaried
> > > engineers from Cloudera, Microsoft and Hortonworks. Since there are
> three
> > > major organisations involved, the risk of reliance on a single group of
> > > salaried developers is mitigated. The Livy user base is diverse, with
> users
> > > from across the globe, including users from academic settings. We aim
> to
> > > further diversify the Livy user and contributor base.
> > >
> >
> > There's a disconnect between this paragraph and the initial committers
> > list. Specifically, no one from Microsoft is represented (as best as I
> can
> > tell).
>
> Ah, this lack of clarity is my fault as an editor. One of the initial
> committers was recently employed by Microsoft but is now in the
> process of changing employers. Another person formally affiliated with
> both the project and Microsoft has decided not to continue
> participating.
>
> I could rephrase this to talk about the contributions made to date as
> being from individuals then in the employ of three major companies.
> Then call out the initial committer list as from two of those and one
> unaffiliated. Would that read clearer?
>
>
I think the explanation is enough, nothing to change.


> > > Cloudera currently owns the domain name: http://livy.io/ which will be
> > > transferred to the ASF and redirected to the official page during
> > > incubation.
> > >
> > >
> >
> > I'm assuming that the incoming project is aware that we expect the main
> dev
> > landing page to be livy.incubator.apache.org . We will want to track
> this
> > as a project specific item.
>
>
> Yep, once all the docs are moved over to ASF infrastructure we can
> just have the current domain act as a redirect. Should I call this out
> in the proposal?
>
>
It would be good to list out the goal of moving to livy.i.a.o but not
needed.


>
> > > == Git Repository ==
> > >
> > > git://git.apache.org/livy
> > >
> >
> > Just to confirm - it'll be incubator-livy, not just livy.
>
> right right. I'll correct this when making the other edits.
>
>
> > > == Issue Tracking ==
> > >
> > > We would like to import our current JIRA project into the ASF JIRA,
> such
> > > that our historical commit message and code comments continue to
> reference
> > > the appropriate bug numbers.
> > >
> >
> > I would recommend reaching out to infra to see if the import is possible
> > before voting on the project. Otherwise you'll need to list out an la
>
> Sure I can reach out. I've seen this done a few times, so I consider
> it low risk. The end of your line appears to have been lost, what's
> the "Otherwise..." ?
>
>
Yeah, I'm not sure what happened there either.  I checked my mail client,
E_NOCLUE.

"list out an alternative" is what I was typing.


> > > = Sponsors =
> > > == Champion ==
> > >
> > > * Sean Busbey (bus...@apache.org)
> > >
> > > == Nominated Mentors ==
> > >
> > > * Bikas Saha (bi...@apache.org)
> > > * Brock Noland (br...@phdata.io)
> >
> >
> > A couple of points:
> >
> > - Sean, while the champion and mentor roles are separate, we do hope that
> > all champions will continue on as a mentor. If this is your intention
> > please add yourself.
>
> After having to withdraw from mentoring a couple of podlings at the
> end of last year I am conservative about what my volunteer time looks
> like right now. I'm certain I can spare the time to help the Livy
> community get introduced to the incubator. I'm not certain beyond
> that, so I am not listed as a formal mentor.
>
> >  - All mentors must be on the IPMC. Foundation membership isn't a
> > requirement, however most people use membership to get access to the
> IPMC.
> > If Bikas wants to be a mentor, he'll need to join the IPMC otherwise
> you'll
> > need to find 2 mentors.
>
> I was pretty sure Bikas had already done this step. I'll chase this
> down and find the disconnect.
>
> > - Do the proposed mentors have a relationship to the incoming project,
> e.g.
> > do they care if it succeeds from a corporate interest standpoint?
>
> I'll let the mentors answer for themselves here, as I won't presume to
> know their specific motivations for volunteering.
>
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>


Re: [PROPOSAL] Livy Proposal for Apache Incubator

2017-05-22 Thread Sean Busbey
On 2017-05-21 09:46 (-0500), "John D. Ament"  wrote:
>
> On Fri, May 19, 2017 at 7:45 PM Sean Busbey  wrote:
> >
> > == Reliance on salaried Developers ==
> >
> > The existing contributors to the Livy project have been made by salaried
> > engineers from Cloudera, Microsoft and Hortonworks. Since there are three
> > major organisations involved, the risk of reliance on a single group of
> > salaried developers is mitigated. The Livy user base is diverse, with users
> > from across the globe, including users from academic settings. We aim to
> > further diversify the Livy user and contributor base.
> >
>
> There's a disconnect between this paragraph and the initial committers
> list. Specifically, no one from Microsoft is represented (as best as I can
> tell).

Ah, this lack of clarity is my fault as an editor. One of the initial
committers was recently employed by Microsoft but is now in the
process of changing employers. Another person formally affiliated with
both the project and Microsoft has decided not to continue
participating.

I could rephrase this to talk about the contributions made to date as
being from individuals then in the employ of three major companies.
Then call out the initial committer list as from two of those and one
unaffiliated. Would that read clearer?

> > Cloudera currently owns the domain name: http://livy.io/ which will be
> > transferred to the ASF and redirected to the official page during
> > incubation.
> >
> >
>
> I'm assuming that the incoming project is aware that we expect the main dev
> landing page to be livy.incubator.apache.org . We will want to track this
> as a project specific item.


Yep, once all the docs are moved over to ASF infrastructure we can
just have the current domain act as a redirect. Should I call this out
in the proposal?


> > == Git Repository ==
> >
> > git://git.apache.org/livy
> >
>
> Just to confirm - it'll be incubator-livy, not just livy.

right right. I'll correct this when making the other edits.


> > == Issue Tracking ==
> >
> > We would like to import our current JIRA project into the ASF JIRA, such
> > that our historical commit message and code comments continue to reference
> > the appropriate bug numbers.
> >
>
> I would recommend reaching out to infra to see if the import is possible
> before voting on the project. Otherwise you'll need to list out an la

Sure I can reach out. I've seen this done a few times, so I consider
it low risk. The end of your line appears to have been lost, what's
the "Otherwise..." ?

> > = Sponsors =
> > == Champion ==
> >
> > * Sean Busbey (bus...@apache.org)
> >
> > == Nominated Mentors ==
> >
> > * Bikas Saha (bi...@apache.org)
> > * Brock Noland (br...@phdata.io)
>
>
> A couple of points:
>
> - Sean, while the champion and mentor roles are separate, we do hope that
> all champions will continue on as a mentor. If this is your intention
> please add yourself.

After having to withdraw from mentoring a couple of podlings at the
end of last year I am conservative about what my volunteer time looks
like right now. I'm certain I can spare the time to help the Livy
community get introduced to the incubator. I'm not certain beyond
that, so I am not listed as a formal mentor.

>  - All mentors must be on the IPMC. Foundation membership isn't a
> requirement, however most people use membership to get access to the IPMC.
> If Bikas wants to be a mentor, he'll need to join the IPMC otherwise you'll
> need to find 2 mentors.

I was pretty sure Bikas had already done this step. I'll chase this
down and find the disconnect.

> - Do the proposed mentors have a relationship to the incoming project, e.g.
> do they care if it succeeds from a corporate interest standpoint?

I'll let the mentors answer for themselves here, as I won't presume to
know their specific motivations for volunteering.

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [PROPOSAL] Livy Proposal for Apache Incubator

2017-05-22 Thread Sean Busbey
On 2017-05-22 03:01 (-0500), Luciano Resende  wrote:
>  1>
>
> Also, I see the proposal is short on mentors, so feel free to include me as>
> a mentor for the project.>
>
> Thanks>
>


Thanks Luciano! I've added you to the wiki page as a mentor.

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [PROPOSAL] Livy Proposal for Apache Incubator

2017-05-22 Thread Jitendra Pandey
+1

On 5/22/17, 1:01 AM, "Luciano Resende"  wrote:

+1

Also, I see the proposal is short on mentors, so feel free to include me as
a mentor for the project.

Thanks

On Fri, May 19, 2017 at 4:45 PM Sean Busbey  wrote:

> Dear Apache Incubator Community,
>
> I'm excited to present for discussion a proposal to move Livy into
> incubation. Livy is web service that exposes a REST interface for managing
> long running Apache Spark contexts in your cluster. With Livy, new
> applications can be built on top of Apache Spark that require fine grained
> interaction with many Spark contexts.
>
> The proposal is on the wiki at the following page as well as copied in the
> email below:
>
> https://wiki.apache.org/incubator/LivyProposal
>
> In addition to welcoming feedback on the proposal, we are actively seeking
> one or more additional mentors. We also have included a section for
> interested folks to ensure they get added to the mailing lists, presuming
> Livy gets accepted for incubation.
>
>  LivyProposal
>
> = Abstract =
>
> Livy is web service that exposes a REST interface for managing
> long running Apache Spark contexts in your cluster. With Livy, new
> applications can be built on top of Apache Spark that require fine grained
> interaction with many Spark contexts.
>
> = Proposal =
>
> Livy is an open-source REST service for Apache Spark. Livy
> enables applications to submit Spark applications and retrieve results
> without a co-location requirement on the Spark cluster.
>
> We propose to contribute the Livy codebase and associated artifacts (e.g.
> documentation, web-site context etc) to the Apache Software Foundation.
>
> = Background =
>
> Apache Spark is a fast and general purpose distributed
> compute engine, with a versatile API. It enables processing of large
> quantities of static data distributed over a cluster of machines, as well
> as
> processing of continuous streams of data. It is the preferred distributed
> data processing engine for data engineering, stream processing and data
> science workloads. Each Spark application uses a construct called the
> SparkContext, which is the application’s connection or entry point to the
> Spark engine. Each Spark application will have its own SparkContext.
>
> Livy enables clients to interact with one or more Spark sessions through
> the
> Livy Server, which acts as a proxy layer. Livy Clients have fine grained
> control over the lifecycle of the Spark sessions, as well as the ability 
to
> submit jobs and retrieve results, all over HTTP.  Clients have two modes 
of
> interaction: RPC Client API, available in Java and Python, which allows
> results to be retrieved as Java or Python objects. The serialization and
> deserialization of the results is handled by the Livy framework.  HTTP
> based
> API that allows submission of code snippets, and retrieval of the results
> in
> different formats.
>
> Multi-tenant resource allocation and security: Livy enables multiple
> independent Spark sessions to be managed simultaneously. Multiple clients
> can also interact simultaneously with the same Spark session and share the
> resources of that Spark session. Livy can also enforce secure,
> authenticated
> communication between the clients and their respective Spark sessions.
>
> More information on Livy can be found at the existing open source website:
> http://livy.io/
>
> = Rationale =
>
> Users want to use Spark’s powerful processing engine and API
> as the data processing backend for interactive applications. However, the
> job submission and application interaction mechanisms built into Apache
> Spark are insufficient and cumbersome for multi-user interactive
> applications.
>
> The primary mechanism for applications to submit Spark jobs is via
> spark-submit
> (http://spark.apache.org/docs/latest/submitting-applications.html), which
> is
> available as a command line tool as well as a programmatic API. However,
> spark-submit has the following limitations that make it difficult to build
> interactive applications: It is slow: each invocation of spark-submit
> involves a setup phase where cluster resources are acquired, new processes
> are forked, etc. This setup phase runs for many seconds, or even minutes,
> and hence is too slow for interactive applications.  It is cumbersome and
> lacks flexibility: application code and dependencies have to be
> pre-compiled
> and submitted as jars, and can not be submitted interactively.
>
> Apache Spark comes with an ODBC/JDBC server, which can be used to submit
> SQL
  

Re: [PROPOSAL] Livy Proposal for Apache Incubator

2017-05-22 Thread Luciano Resende
+1

Also, I see the proposal is short on mentors, so feel free to include me as
a mentor for the project.

Thanks

On Fri, May 19, 2017 at 4:45 PM Sean Busbey  wrote:

> Dear Apache Incubator Community,
>
> I'm excited to present for discussion a proposal to move Livy into
> incubation. Livy is web service that exposes a REST interface for managing
> long running Apache Spark contexts in your cluster. With Livy, new
> applications can be built on top of Apache Spark that require fine grained
> interaction with many Spark contexts.
>
> The proposal is on the wiki at the following page as well as copied in the
> email below:
>
> https://wiki.apache.org/incubator/LivyProposal
>
> In addition to welcoming feedback on the proposal, we are actively seeking
> one or more additional mentors. We also have included a section for
> interested folks to ensure they get added to the mailing lists, presuming
> Livy gets accepted for incubation.
>
>  LivyProposal
>
> = Abstract =
>
> Livy is web service that exposes a REST interface for managing
> long running Apache Spark contexts in your cluster. With Livy, new
> applications can be built on top of Apache Spark that require fine grained
> interaction with many Spark contexts.
>
> = Proposal =
>
> Livy is an open-source REST service for Apache Spark. Livy
> enables applications to submit Spark applications and retrieve results
> without a co-location requirement on the Spark cluster.
>
> We propose to contribute the Livy codebase and associated artifacts (e.g.
> documentation, web-site context etc) to the Apache Software Foundation.
>
> = Background =
>
> Apache Spark is a fast and general purpose distributed
> compute engine, with a versatile API. It enables processing of large
> quantities of static data distributed over a cluster of machines, as well
> as
> processing of continuous streams of data. It is the preferred distributed
> data processing engine for data engineering, stream processing and data
> science workloads. Each Spark application uses a construct called the
> SparkContext, which is the application’s connection or entry point to the
> Spark engine. Each Spark application will have its own SparkContext.
>
> Livy enables clients to interact with one or more Spark sessions through
> the
> Livy Server, which acts as a proxy layer. Livy Clients have fine grained
> control over the lifecycle of the Spark sessions, as well as the ability to
> submit jobs and retrieve results, all over HTTP.  Clients have two modes of
> interaction: RPC Client API, available in Java and Python, which allows
> results to be retrieved as Java or Python objects. The serialization and
> deserialization of the results is handled by the Livy framework.  HTTP
> based
> API that allows submission of code snippets, and retrieval of the results
> in
> different formats.
>
> Multi-tenant resource allocation and security: Livy enables multiple
> independent Spark sessions to be managed simultaneously. Multiple clients
> can also interact simultaneously with the same Spark session and share the
> resources of that Spark session. Livy can also enforce secure,
> authenticated
> communication between the clients and their respective Spark sessions.
>
> More information on Livy can be found at the existing open source website:
> http://livy.io/
>
> = Rationale =
>
> Users want to use Spark’s powerful processing engine and API
> as the data processing backend for interactive applications. However, the
> job submission and application interaction mechanisms built into Apache
> Spark are insufficient and cumbersome for multi-user interactive
> applications.
>
> The primary mechanism for applications to submit Spark jobs is via
> spark-submit
> (http://spark.apache.org/docs/latest/submitting-applications.html), which
> is
> available as a command line tool as well as a programmatic API. However,
> spark-submit has the following limitations that make it difficult to build
> interactive applications: It is slow: each invocation of spark-submit
> involves a setup phase where cluster resources are acquired, new processes
> are forked, etc. This setup phase runs for many seconds, or even minutes,
> and hence is too slow for interactive applications.  It is cumbersome and
> lacks flexibility: application code and dependencies have to be
> pre-compiled
> and submitted as jars, and can not be submitted interactively.
>
> Apache Spark comes with an ODBC/JDBC server, which can be used to submit
> SQL
> queries to Spark. However, this solution is limited to SQL and does not
> allow the client to leverage the rest of the Spark API, such as RDDs, MLlib
> and Streaming.
>
> A third way of using Spark is via its command-line shell, which allows the
> interactive submission of snippets of Spark code. However, the shell
> entails
> running Spark code on the client machine and hence is not a viable
> mechanism
> for remote clients to submit Spark jobs.
>
> Livy solves the limitations of the 

Re: [PROPOSAL] Livy Proposal for Apache Incubator

2017-05-21 Thread John D. Ament
Sean,

On Fri, May 19, 2017 at 7:45 PM Sean Busbey  wrote:

> Dear Apache Incubator Community,
>
> I'm excited to present for discussion a proposal to move Livy into
> incubation. Livy is web service that exposes a REST interface for managing
> long running Apache Spark contexts in your cluster. With Livy, new
> applications can be built on top of Apache Spark that require fine grained
> interaction with many Spark contexts.
>
> The proposal is on the wiki at the following page as well as copied in the
> email below:
>
> https://wiki.apache.org/incubator/LivyProposal
>
> In addition to welcoming feedback on the proposal, we are actively seeking
> one or more additional mentors. We also have included a section for
> interested folks to ensure they get added to the mailing lists, presuming
> Livy gets accepted for incubation.
>
>  LivyProposal
>
> = Abstract =
>
> Livy is web service that exposes a REST interface for managing
> long running Apache Spark contexts in your cluster. With Livy, new
> applications can be built on top of Apache Spark that require fine grained
> interaction with many Spark contexts.
>
> = Proposal =
>
> Livy is an open-source REST service for Apache Spark. Livy
> enables applications to submit Spark applications and retrieve results
> without a co-location requirement on the Spark cluster.
>
> We propose to contribute the Livy codebase and associated artifacts (e.g.
> documentation, web-site context etc) to the Apache Software Foundation.
>
> = Background =
>
> Apache Spark is a fast and general purpose distributed
> compute engine, with a versatile API. It enables processing of large
> quantities of static data distributed over a cluster of machines, as well
> as
> processing of continuous streams of data. It is the preferred distributed
> data processing engine for data engineering, stream processing and data
> science workloads. Each Spark application uses a construct called the
> SparkContext, which is the application’s connection or entry point to the
> Spark engine. Each Spark application will have its own SparkContext.
>
> Livy enables clients to interact with one or more Spark sessions through
> the
> Livy Server, which acts as a proxy layer. Livy Clients have fine grained
> control over the lifecycle of the Spark sessions, as well as the ability to
> submit jobs and retrieve results, all over HTTP.  Clients have two modes of
> interaction: RPC Client API, available in Java and Python, which allows
> results to be retrieved as Java or Python objects. The serialization and
> deserialization of the results is handled by the Livy framework.  HTTP
> based
> API that allows submission of code snippets, and retrieval of the results
> in
> different formats.
>
> Multi-tenant resource allocation and security: Livy enables multiple
> independent Spark sessions to be managed simultaneously. Multiple clients
> can also interact simultaneously with the same Spark session and share the
> resources of that Spark session. Livy can also enforce secure,
> authenticated
> communication between the clients and their respective Spark sessions.
>
> More information on Livy can be found at the existing open source website:
> http://livy.io/
>
> = Rationale =
>
> Users want to use Spark’s powerful processing engine and API
> as the data processing backend for interactive applications. However, the
> job submission and application interaction mechanisms built into Apache
> Spark are insufficient and cumbersome for multi-user interactive
> applications.
>
> The primary mechanism for applications to submit Spark jobs is via
> spark-submit
> (http://spark.apache.org/docs/latest/submitting-applications.html), which
> is
> available as a command line tool as well as a programmatic API. However,
> spark-submit has the following limitations that make it difficult to build
> interactive applications: It is slow: each invocation of spark-submit
> involves a setup phase where cluster resources are acquired, new processes
> are forked, etc. This setup phase runs for many seconds, or even minutes,
> and hence is too slow for interactive applications.  It is cumbersome and
> lacks flexibility: application code and dependencies have to be
> pre-compiled
> and submitted as jars, and can not be submitted interactively.
>
> Apache Spark comes with an ODBC/JDBC server, which can be used to submit
> SQL
> queries to Spark. However, this solution is limited to SQL and does not
> allow the client to leverage the rest of the Spark API, such as RDDs, MLlib
> and Streaming.
>
> A third way of using Spark is via its command-line shell, which allows the
> interactive submission of snippets of Spark code. However, the shell
> entails
> running Spark code on the client machine and hence is not a viable
> mechanism
> for remote clients to submit Spark jobs.
>
> Livy solves the limitations of the above three mechanisms, and provides the
> full Spark API as a multi-tenant service to remote clients.
>
> 

Re: [PROPOSAL] Livy Proposal for Apache Incubator

2017-05-21 Thread Brock Noland
Great to see!

+1

On Fri, May 19, 2017 at 7:24 PM, William GUO  wrote:

> +1
>
> Griffin needs Livy to access Spark context.
>
>
> Thanks,
> William
>
> On 5/20/17, 7:45 AM, "Sean Busbey"  wrote:
>
> Dear Apache Incubator Community,
>
> I'm excited to present for discussion a proposal to move Livy into
> incubation. Livy is web service that exposes a REST interface for
> managing
> long running Apache Spark contexts in your cluster. With Livy, new
> applications can be built on top of Apache Spark that require fine
> grained
> interaction with many Spark contexts.
>
> The proposal is on the wiki at the following page as well as copied in
> the
> email below:
>
> https://wiki.apache.org/incubator/LivyProposal
>
> In addition to welcoming feedback on the proposal, we are actively
> seeking
> one or more additional mentors. We also have included a section for
> interested folks to ensure they get added to the mailing lists,
> presuming
> Livy gets accepted for incubation.
>
>  LivyProposal
>
> = Abstract =
>
> Livy is web service that exposes a REST interface for managing
> long running Apache Spark contexts in your cluster. With Livy, new
> applications can be built on top of Apache Spark that require fine
> grained
> interaction with many Spark contexts.
>
> = Proposal =
>
> Livy is an open-source REST service for Apache Spark. Livy
> enables applications to submit Spark applications and retrieve results
> without a co-location requirement on the Spark cluster.
>
> We propose to contribute the Livy codebase and associated artifacts
> (e.g.
> documentation, web-site context etc) to the Apache Software Foundation.
>
> = Background =
>
> Apache Spark is a fast and general purpose distributed
> compute engine, with a versatile API. It enables processing of large
> quantities of static data distributed over a cluster of machines, as
> well as
> processing of continuous streams of data. It is the preferred
> distributed
> data processing engine for data engineering, stream processing and data
> science workloads. Each Spark application uses a construct called the
> SparkContext, which is the application’s connection or entry point
> to the
> Spark engine. Each Spark application will have its own SparkContext.
>
> Livy enables clients to interact with one or more Spark sessions
> through the
> Livy Server, which acts as a proxy layer. Livy Clients have fine
> grained
> control over the lifecycle of the Spark sessions, as well as the
> ability to
> submit jobs and retrieve results, all over HTTP.  Clients have two
> modes of
> interaction: RPC Client API, available in Java and Python, which allows
> results to be retrieved as Java or Python objects. The serialization
> and
> deserialization of the results is handled by the Livy framework.  HTTP
> based
> API that allows submission of code snippets, and retrieval of the
> results in
> different formats.
>
> Multi-tenant resource allocation and security: Livy enables multiple
> independent Spark sessions to be managed simultaneously. Multiple
> clients
> can also interact simultaneously with the same Spark session and share
> the
> resources of that Spark session. Livy can also enforce secure,
> authenticated
> communication between the clients and their respective Spark sessions.
>
> More information on Livy can be found at the existing open source
> website:
> http://livy.io/
>
> = Rationale =
>
> Users want to use Spark’s powerful processing engine and API
> as the data processing backend for interactive applications. However,
> the
> job submission and application interaction mechanisms built into Apache
> Spark are insufficient and cumbersome for multi-user interactive
> applications.
>
> The primary mechanism for applications to submit Spark jobs is via
> spark-submit
> (http://spark.apache.org/docs/latest/submitting-applications.html),
> which is
> available as a command line tool as well as a programmatic API.
> However,
> spark-submit has the following limitations that make it difficult to
> build
> interactive applications: It is slow: each invocation of spark-submit
> involves a setup phase where cluster resources are acquired, new
> processes
> are forked, etc. This setup phase runs for many seconds, or even
> minutes,
> and hence is too slow for interactive applications.  It is cumbersome
> and
> lacks flexibility: application code and dependencies have to be
> pre-compiled
> and submitted as jars, and can not be submitted interactively.
>
> Apache Spark comes with an ODBC/JDBC server, which can be used to
> submit SQL
> queries to Spark. However, this solution is limited to SQL and does not
> allow the client to leverage the rest 

Re: [PROPOSAL] Livy Proposal for Apache Incubator

2017-05-19 Thread William GUO
+1 

Griffin needs Livy to access Spark context.


Thanks,
William

On 5/20/17, 7:45 AM, "Sean Busbey"  wrote:

Dear Apache Incubator Community,

I'm excited to present for discussion a proposal to move Livy into
incubation. Livy is web service that exposes a REST interface for managing
long running Apache Spark contexts in your cluster. With Livy, new
applications can be built on top of Apache Spark that require fine grained
interaction with many Spark contexts.

The proposal is on the wiki at the following page as well as copied in the
email below:

https://wiki.apache.org/incubator/LivyProposal

In addition to welcoming feedback on the proposal, we are actively seeking
one or more additional mentors. We also have included a section for
interested folks to ensure they get added to the mailing lists, presuming
Livy gets accepted for incubation.

 LivyProposal

= Abstract =

Livy is web service that exposes a REST interface for managing
long running Apache Spark contexts in your cluster. With Livy, new
applications can be built on top of Apache Spark that require fine grained
interaction with many Spark contexts.  

= Proposal =

Livy is an open-source REST service for Apache Spark. Livy
enables applications to submit Spark applications and retrieve results
without a co-location requirement on the Spark cluster. 

We propose to contribute the Livy codebase and associated artifacts (e.g.
documentation, web-site context etc) to the Apache Software Foundation.

= Background =

Apache Spark is a fast and general purpose distributed
compute engine, with a versatile API. It enables processing of large
quantities of static data distributed over a cluster of machines, as well as
processing of continuous streams of data. It is the preferred distributed
data processing engine for data engineering, stream processing and data
science workloads. Each Spark application uses a construct called the
SparkContext, which is the application’s connection or entry point to the
Spark engine. Each Spark application will have its own SparkContext.

Livy enables clients to interact with one or more Spark sessions through the
Livy Server, which acts as a proxy layer. Livy Clients have fine grained
control over the lifecycle of the Spark sessions, as well as the ability to
submit jobs and retrieve results, all over HTTP.  Clients have two modes of
interaction: RPC Client API, available in Java and Python, which allows
results to be retrieved as Java or Python objects. The serialization and
deserialization of the results is handled by the Livy framework.  HTTP based
API that allows submission of code snippets, and retrieval of the results in
different formats.

Multi-tenant resource allocation and security: Livy enables multiple
independent Spark sessions to be managed simultaneously. Multiple clients
can also interact simultaneously with the same Spark session and share the
resources of that Spark session. Livy can also enforce secure, authenticated
communication between the clients and their respective Spark sessions.

More information on Livy can be found at the existing open source website:
http://livy.io/

= Rationale =

Users want to use Spark’s powerful processing engine and API
as the data processing backend for interactive applications. However, the
job submission and application interaction mechanisms built into Apache
Spark are insufficient and cumbersome for multi-user interactive
applications.

The primary mechanism for applications to submit Spark jobs is via
spark-submit
(http://spark.apache.org/docs/latest/submitting-applications.html), which is
available as a command line tool as well as a programmatic API. However,
spark-submit has the following limitations that make it difficult to build
interactive applications: It is slow: each invocation of spark-submit
involves a setup phase where cluster resources are acquired, new processes
are forked, etc. This setup phase runs for many seconds, or even minutes,
and hence is too slow for interactive applications.  It is cumbersome and
lacks flexibility: application code and dependencies have to be pre-compiled
and submitted as jars, and can not be submitted interactively.

Apache Spark comes with an ODBC/JDBC server, which can be used to submit SQL
queries to Spark. However, this solution is limited to SQL and does not
allow the client to leverage the rest of the Spark API, such as RDDs, MLlib
and Streaming.

A third way of using Spark is via its command-line shell, which allows the
interactive submission of snippets of Spark code. However, the shell entails
running Spark code on