Thanks Nate

for this focused writeup!

Yeah maybe it is time to reboot our brains ...

Additionaly to the points of nate I would like to attack this in bigtop 1.1.0:

…………..
Building from source or downloading ?
……………

However we have a substancial problem hidden deep in th CI „2.0“ approach using 
containers

You may know that we place artifacts (i.e. jars) we built with bigtop into the 
local maven cache ~/.m2. (look for mvn install in do-component-build). The idea 
is that later maven builds will pick these artifacts and use them rather 
downloading them from maven central.

Placing artifacts into ~/.m2 will not have any effect if we use CI containers 
the way we do now: The maven cache ~/.m2 is lost when the container ends.

[This triggered misfeature in JIRA BIGTOP-1893, BTW:  gradle rpm/apt behaved 
differently from a container build with artifacts from maven central.]

Option 1)  Remove mvn install from all do-component-builds

Results:

+ We compile projects the way the upstream-developer does.
- local fixes and configurations will not propagated

Questions:
If we do not try to reuse our build-artifacts within compile we have to ask 
ourself "why do we compile projects at all?“.

We can build a great test wether someone else has touched / manipulated the 
maven central cache if we compare artifacts, but is this the really the point 
of compiling ourselves?


Option 2) Use mvn install and reuse artifacts even in containers.

Consequences:

- Containers are not stateless any more

- We have to add depencies to CI jobs so they run in order

- single components may break the whole compile process.

- Compile does not scale any more

My Opinion:
The way we do now "mvn install“ ,  simply tainting the maven cache seems not a 
really controlled way to propagate artifacts to me.

Option 3) Use 1) but reuse artifacts in packages by placing symlinks and 
dependencies between them.

- Packages will break with subtile problems if we do symlink artifacts  from 
different releases.

----
Neither Option 1, Option 2 nor Option 3 seems a clever way to fix the problem. 
Would like to hear comments regarding this issue:


In my humble opinion we should follow Option 2 with all the grave consequences. 
But maybe reworking mvn install by placing the artifacts with a bigtop specific 
name / groupid into the maven cache and upload them to maven central .

Olaf










> Am 18.06.2015 um 08:26 schrieb n...@reactor8.com:
> 
> Building on conversations pre/during/post Apachecon and looking at the post 
> 1.0 bigtop focus and efforts, want to lay out a few things, get peoples 
> comments.  Seems to be some consensus that the project can look towards 
> serving end application/data developers more going forward, while continuing 
> the tradition of the projects build/pkg/test/deploy roots.
> 
> I have spent the past couple months, and heavily the past 3 or so weeks, 
> talking to many different potential end users at meetups, conferences, etc.., 
> also having some great conversations with commercial open source vendors that 
> are interested in what a "future bigtop" can be and what it could provide to 
> users.
> 
> I believe we need to put some focused effort into few foundational things to 
> put the project in a position to move faster and attract a wider range of 
> users as well as new contributors.
> 
> -----------
> CI "2.0"
> -----------
> 
> Start of this is already underway based on the work roman started last year 
> and continuing effort with new setup and enhancement on bigtop AWS 
> infrastructure, Evans has been pushing this along into the 1.0 release.  
> Speed of getting new packages built and up to date needs to increase so 
> releases can happen at a regular clip.., even looking towards user friendly 
> "ad-hoc" bigtop builds where users could quickly choose the 2,3,4,etc 
> components they want and have a stack around that.
> 
> Related to this, hoping the group can come to some idea/agreement on some 
> semver style versioning for the project post 1.0.  I think this could set a 
> path forward for releases that can happen faster, while not holding up the 
> whole train if a single "smaller" component has a couple issues that 
> cant/wont be resolved by the main stakeholders or interested parties in said 
> component.  An example might be new pig or sqoop having issues.., the 1.2 
> release would still go out the door with 1.2.1 coming days/weeks later once 
> new pig or sqoop was fixed up.
> 
> ---------------------------------------------
> Proper package repository hosting
> ---------------------------------------------
> 
> I put together a little test setup based on the 0.8 assets, we can probably 
> build off of that with 1.0, working towards the CI automatically posting 
> nightly (or just-in-time) builds off latest so people can play around.  
> Debs/rpms seem should be the focal pt of output for the project assets, 
> everything else is additive and builds off of that (ie: user who says "I am 
> not a puppet shop so don’t care about the modules.., but do my own automation 
> and if you point me to some sane repositories I can do the rest myself with 
> couple decent getting started steps")
> 
> -----------------------------------------------------------------
> Greatly increasing the UX and getting started content
> -----------------------------------------------------------------
> 
> This is the big one.., new website, focused docs and getting started examples 
> for end users, other specific content for contributors.  I will be starting 
> to put some cycles into new website jira probably starting next week, will 
> try to scoot through it and start posting some working examples for feedback 
> once something basic is in place.  For those interested in helping out on doc 
> work and getting started content let me know.., looking at subjects like:
> 
>   -Developer getting started
>         -using the packages
>         -using puppet modules and deployment options
>         -deploying reference example stacks
>         -setting up your own big data CI
>         -etc
> 
>   -Contributing to Bigtop:
>         -how to submit your first patch/pull-request
>         -adding new component (step by step, canned learning component 
> example, etc)
>         -adding tests to an existing component (steps, canned hello world 
> example test, etc)
>         -writing your own test data generator
>         -etc
> 
> Those are some thoughts and couple initial focal areas that are driving me 
> around bigtop participation
> 
> 
> 
> -----Original Message-----
> From: Andrew Purtell [mailto:apurt...@apache.org]
> Sent: Tuesday, June 16, 2015 12:02 PM
> To: d...@bigtop.apache.org
> Cc: user@bigtop.apache.org
> Subject: Re: Rebooting the conversation on the Future of bigtop: Abstracting 
> the backplane ? Containers?
> 
>> thanks andy - i agree with most of your opinions around continuing to
> build
> standard packages.. but can you clarify what was offensive ?  must be a 
> misinterpretation somewhere.
> 
> Sure.
> 
> A bit offensive.
> 
> "gridgain or spark can do what 90% of the hadoop ecosystem already does, 
> supporting streams, batch,sql all in one" -> This statement deprecates the 
> utility of the labors of rest of the Hadoop ecosystem in favor of Gridgain 
> and Spark. As a gross generalization it's unlikely to be a helpful statement 
> in any case.
> 
> It's fine if we all have our favorites, of course. I think we're set up well 
> to empirically determine winners and losers, we don't need to make partisan 
> statements. Those components that get some user interest in the form of 
> contributions that keep them building and happy in Bigtop will stay in. Those 
> that do not get the necessary attention will have to be culled out over time 
> when and if they fail to compile or pass integration tests.
> 
> 
> On Mon, Jun 15, 2015 at 11:42 AM, jay vyas <jayunit100.apa...@gmail.com>
> wrote:
> 
>> thanks andy - i agree with most of your opinions around continuing to
>> build standard packages.. but can you clarify what was offensive ?
>> must be a misinterpretation somewhere.
>> 
>> 1) To be clear, i am 100% behind supporting standard hadoop build rpms that
>> we have now.   Thats the core product and will be for  the forseeable
>> future, absolutely !
>> 
>> 2) The idea (and its just an idea i want to throw out - to keep us on
>> our toes), is that some folks may be interested in hacking around, in
>> a separate branch - on some bleeding edge bigdata deployments - which
>> attempts to incorporate resource managers and  containers as
>> first-class citizens.
>> 
>> Again this is all just ideas - not in any way meant to derail the
>> packaging efforts - but rather - just to gauge folks interest level in
>> the bleeding edge, docker, mesos, simplified  processing stacks, and so on.
>> 
>> 
>> 
>> On Mon, Jun 15, 2015 at 12:39 PM, Andrew Purtell <apurt...@apache.org>
>> wrote:
>> 
>>>> gridgain or spark can do what 90% of the hadoop ecosystem already
>>>> does,
>>> supporting streams, batch,sql all in one)
>>> 
>>> If something like this becomes the official position of the Bigtop
>>> project, some day, then it will turn off people. I can see where you
>>> are coming from, I think. Correct me if I'm wrong: We have limited
>>> bandwidth, we should move away from Roman et. al.'s vision of Bigtop
>>> as an inclusive distribution of big data packages, and instead
>>> become highly opinionated and tightly focused. If that's accurate, I
>>> can sum up my concern as
>>> follows: To the degree we become more opinionated, the less we may
>>> have
>> to
>>> look at in terms of inclusion - both software and user communities.
>>> For example, I find the above quoted statement a bit offensive as a
>> participant
>>> on not-Spark and not-Gridgain projects. I roll my eyes sometimes at
>>> the Docker over-hype. Is there still a place for me here?
>>> 
>>> 
>>> 
>>> On Mon, Jun 15, 2015 at 9:22 AM, jay vyas
>>> <jayunit100.apa...@gmail.com>
>>> wrote:
>>> 
>>>> Hi folks.   Every few months, i try to reboot the conversation about the
>>>> next generation of bigtop.
>>>> 
>>>> There are 3 things which i think we should consider : A backplane
>> (rather
>>>> than deploy to machines, the meaning of the term "ecosystem" in a
>>>> post-spark in-memory apacolypse, and containerization.
>>>> 
>>>> 1) BACKPLANE: The new trend is to have a backplane that provides
>>>> networking abstractions for you (mesos, kubernetes, yarn, and so on).
>> Is
>>>> it time for us to pick a resource manager?
>>>> 
>>>> 2) ECOSYSTEM?: Nowadays folks don't necessarily need the whole
>>>> hadoop ecosystem, and there is a huge shift to in-memory,
>>>> monolithic stacks happening (i.e. gridgain or spark can do what 90%
>>>> of the hadoop
>> ecosystem
>>>> already does, supporting streams, batch,sql all in one).
>>>> 
>>>> 3) CONTAINERS:  we are doing a great job w/ docker in our build infra.
>>>> Is it time to start experimenting with running docker tarballs ?
>>>> 
>>>> Combining 1+2+3 - i could see a useful bigdata upstream distro
>>>> which (1) just installed an HCFS implementation (gluster,HDFS,...)
>>>> along side,
>> say,
>>>> (2) mesos as a backplane for the tooling for [[ hbase + spark +
>>>> ignite
>> ]]
>>>> --- and then (3) do the integration testing of available
>>>> mesos-framework plugins for ignite and spark underneath.  If other
>>>> folks are interested, maybe we could create the "1x" or "in-memory"
>>>> branch to start hacking
>> on it
>>>> sometime ?    Maybe even bring the flink guys in as well, as they are
>>>> interested in bigtop packaging.
>>>> 
>>>> 
>>>> 
>>>> --
>>>> jay vyas
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Best regards,
>>> 
>>>   - Andy
>>> 
>>> Problems worthy of attack prove their worth by hitting back. - Piet
>>> Hein (via Tom White)
>>> 
>> 
>> 
>> 
>> --
>> jay vyas
>> 
> 
> 
> 
> --
> Best regards,
> 
>   - Andy
> 
> Problems worthy of attack prove their worth by hitting back. - Piet Hein (via 
> Tom White)
> 

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

Reply via email to