RE: Rebooting the conversation on the Future of bigtop: Abstracting the backplane ? Containers?

nate Wed, 17 Jun 2015 23:27:07 -0700

Building on conversations pre/during/post Apachecon and looking at the post 1.0 
bigtop focus and efforts, want to lay out a few things, get peoples comments.  
Seems to be some consensus that the project can look towards serving end 
application/data developers more going forward, while continuing the tradition 
of the projects build/pkg/test/deploy roots.

I have spent the past couple months, and heavily the past 3 or so weeks, 
talking to many different potential end users at meetups, conferences, etc.., 
also having some great conversations with commercial open source vendors that 
are interested in what a "future bigtop" can be and what it could provide to 
users.

I believe we need to put some focused effort into few foundational things to 
put the project in a position to move faster and attract a wider range of users 
as well as new contributors.

-----------
CI "2.0"
-----------

Start of this is already underway based on the work roman started last year and 
continuing effort with new setup and enhancement on bigtop AWS infrastructure, 
Evans has been pushing this along into the 1.0 release.  Speed of getting new 
packages built and up to date needs to increase so releases can happen at a 
regular clip.., even looking towards user friendly "ad-hoc" bigtop builds where 
users could quickly choose the 2,3,4,etc components they want and have a stack 
around that.

Related to this, hoping the group can come to some idea/agreement on some 
semver style versioning for the project post 1.0.  I think this could set a 
path forward for releases that can happen faster, while not holding up the 
whole train if a single "smaller" component has a couple issues that cant/wont 
be resolved by the main stakeholders or interested parties in said component.  
An example might be new pig or sqoop having issues.., the 1.2 release would 
still go out the door with 1.2.1 coming days/weeks later once new pig or sqoop 
was fixed up.

---------------------------------------------
Proper package repository hosting
---------------------------------------------

I put together a little test setup based on the 0.8 assets, we can probably 
build off of that with 1.0, working towards the CI automatically posting 
nightly (or just-in-time) builds off latest so people can play around.  
Debs/rpms seem should be the focal pt of output for the project assets, 
everything else is additive and builds off of that (ie: user who says "I am not 
a puppet shop so don’t care about the modules.., but do my own automation and 
if you point me to some sane repositories I can do the rest myself with couple 
decent getting started steps")

-----------------------------------------------------------------
Greatly increasing the UX and getting started content
-----------------------------------------------------------------

This is the big one.., new website, focused docs and getting started examples 
for end users, other specific content for contributors.  I will be starting to 
put some cycles into new website jira probably starting next week, will try to 
scoot through it and start posting some working examples for feedback once 
something basic is in place.  For those interested in helping out on doc work 
and getting started content let me know.., looking at subjects like:

   -Developer getting started
         -using the packages
         -using puppet modules and deployment options
         -deploying reference example stacks
         -setting up your own big data CI
         -etc

   -Contributing to Bigtop:
         -how to submit your first patch/pull-request
         -adding new component (step by step, canned learning component 
example, etc)
         -adding tests to an existing component (steps, canned hello world 
example test, etc)
         -writing your own test data generator
         -etc

Those are some thoughts and couple initial focal areas that are driving me 
around bigtop participation

-----Original Message-----
From: Andrew Purtell [mailto:[email protected]] 
Sent: Tuesday, June 16, 2015 12:02 PM
To: [email protected]
Cc: [email protected]
Subject: Re: Rebooting the conversation on the Future of bigtop: Abstracting 
the backplane ? Containers?

> thanks andy - i agree with most of your opinions around continuing to
build
standard packages.. but can you clarify what was offensive ?  must be a 
misinterpretation somewhere.

Sure.

A bit offensive.

"gridgain or spark can do what 90% of the hadoop ecosystem already does, 
supporting streams, batch,sql all in one" -> This statement deprecates the 
utility of the labors of rest of the Hadoop ecosystem in favor of Gridgain and 
Spark. As a gross generalization it's unlikely to be a helpful statement in any 
case.

It's fine if we all have our favorites, of course. I think we're set up well to 
empirically determine winners and losers, we don't need to make partisan 
statements. Those components that get some user interest in the form of 
contributions that keep them building and happy in Bigtop will stay in. Those 
that do not get the necessary attention will have to be culled out over time 
when and if they fail to compile or pass integration tests.

On Mon, Jun 15, 2015 at 11:42 AM, jay vyas <[email protected]>
wrote:

> thanks andy - i agree with most of your opinions around continuing to 
> build standard packages.. but can you clarify what was offensive ?  
> must be a misinterpretation somewhere.
>
> 1) To be clear, i am 100% behind supporting standard hadoop build rpms that
> we have now.   Thats the core product and will be for  the forseeable
> future, absolutely !
>
> 2) The idea (and its just an idea i want to throw out - to keep us on 
> our toes), is that some folks may be interested in hacking around, in 
> a separate branch - on some bleeding edge bigdata deployments - which 
> attempts to incorporate resource managers and  containers as 
> first-class citizens.
>
> Again this is all just ideas - not in any way meant to derail the 
> packaging efforts - but rather - just to gauge folks interest level in 
> the bleeding edge, docker, mesos, simplified  processing stacks, and so on.
>
>
>
> On Mon, Jun 15, 2015 at 12:39 PM, Andrew Purtell <[email protected]>
> wrote:
>
> > > gridgain or spark can do what 90% of the hadoop ecosystem already 
> > > does,
> > supporting streams, batch,sql all in one)
> >
> > If something like this becomes the official position of the Bigtop 
> > project, some day, then it will turn off people. I can see where you 
> > are coming from, I think. Correct me if I'm wrong: We have limited 
> > bandwidth, we should move away from Roman et. al.'s vision of Bigtop 
> > as an inclusive distribution of big data packages, and instead 
> > become highly opinionated and tightly focused. If that's accurate, I 
> > can sum up my concern as
> > follows: To the degree we become more opinionated, the less we may 
> > have
> to
> > look at in terms of inclusion - both software and user communities. 
> > For example, I find the above quoted statement a bit offensive as a
> participant
> > on not-Spark and not-Gridgain projects. I roll my eyes sometimes at 
> > the Docker over-hype. Is there still a place for me here?
> >
> >
> >
> > On Mon, Jun 15, 2015 at 9:22 AM, jay vyas 
> > <[email protected]>
> > wrote:
> >
> >> Hi folks.   Every few months, i try to reboot the conversation about the
> >> next generation of bigtop.
> >>
> >> There are 3 things which i think we should consider : A backplane
> (rather
> >> than deploy to machines, the meaning of the term "ecosystem" in a 
> >> post-spark in-memory apacolypse, and containerization.
> >>
> >> 1) BACKPLANE: The new trend is to have a backplane that provides 
> >> networking abstractions for you (mesos, kubernetes, yarn, and so on).
>  Is
> >> it time for us to pick a resource manager?
> >>
> >> 2) ECOSYSTEM?: Nowadays folks don't necessarily need the whole 
> >> hadoop ecosystem, and there is a huge shift to in-memory, 
> >> monolithic stacks happening (i.e. gridgain or spark can do what 90% 
> >> of the hadoop
> ecosystem
> >> already does, supporting streams, batch,sql all in one).
> >>
> >> 3) CONTAINERS:  we are doing a great job w/ docker in our build infra.
> >> Is it time to start experimenting with running docker tarballs ?
> >>
> >> Combining 1+2+3 - i could see a useful bigdata upstream distro 
> >> which (1) just installed an HCFS implementation (gluster,HDFS,...) 
> >> along side,
> say,
> >> (2) mesos as a backplane for the tooling for [[ hbase + spark + 
> >> ignite
> ]]
> >> --- and then (3) do the integration testing of available 
> >> mesos-framework plugins for ignite and spark underneath.  If other 
> >> folks are interested, maybe we could create the "1x" or "in-memory" 
> >> branch to start hacking
> on it
> >> sometime ?    Maybe even bring the flink guys in as well, as they are
> >> interested in bigtop packaging.
> >>
> >>
> >>
> >> --
> >> jay vyas
> >>
> >
> >
> >
> > --
> > Best regards,
> >
> >    - Andy
> >
> > Problems worthy of attack prove their worth by hitting back. - Piet 
> > Hein (via Tom White)
> >
>
>
>
> --
> jay vyas
>

--
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein (via 
Tom White)

RE: Rebooting the conversation on the Future of bigtop: Abstracting the backplane ? Containers?

Reply via email to