Re: Rebooting the conversation on the Future of bigtop: Abstracting the backplane ? Containers?

2015-06-22 Thread Andrew Purtell
For as long as we support building components from GitHub repository by
SHA, we must support the local install steps in do-component-build scripts.
Otherwise the result cannot be transitively consistent.

We should not assume a Bigtop user will be building with a BOM full of
conveniently already released artifacts in public Maven repos (see above),
or even with direct access to public networks. It would be inconvenient but
I could see an extra 'getting started' step of setting up a local Nexus or
similar. However, when building from SHAs on a dev workstation the local
Maven cache seems actually the best option. Alternatively, we can declare
this use case is no longer supported. That would make me sad.

Would also be great if we can continue supporting Bigtop package builds on
build servers and developer workstations without requiring any specific
containerization technology (or any containerization at all). Giving people
an option to use Docker specific techniques is fine as long as it is
totally optional.


On Fri, Jun 19, 2015 at 11:26 PM, Bruno Mahé bm...@apache.org wrote:

  Echoing both Nate and Evans, I would not limit ourselves based on the
 technology used for the build.

 However, I am not sure to completely follow option 3. We are doing that
 already for packages. For instance if package A depends on Apache
 Zookeeper., then the package A does depend on Apache Zookeeper and includes
 symlinks to the Apache Zookeeper library provided by the Apache Zookeeper
 package.


 Thanks,
 Bruno



 On 06/19/2015 12:47 PM, n...@reactor8.com wrote:

  Echoing Evans, think we should not be worried about stateless vs
 non-stateless containers.., seems core idea and need to is optimize the
 build process and maximize re-use whether on host or container machines or
 build environments.



 Added sub-task with Olaf’s idea to Evans umbrella CI task, currently
 marked it for 1.1:



 https://issues.apache.org/jira/browse/BIGTOP-1906







 *From:* Evans Ye [mailto:evan...@apache.org evan...@apache.org]
 *Sent:* Friday, June 19, 2015 7:16 AM
 *To:* user@bigtop.apache.org
 *Subject:* Re: Rebooting the conversation on the Future of bigtop:
 Abstracting the backplane ? Containers?



 I thnk it's not a problem that container is not stateless. No matter how
 we should have CI jobs that builds all the artifacts and store them as
 official repos.
 You point out an important thing that is the mvn install is the key
 feature to propergate self patched components around. If we disable this
 than there's no reason to build jars by ourselves. I'm +1 to option 2.

 2015年6月19日 上午5:59於 Olaf Flebbe o...@oflebbe.de寫道:


  Am 18.06.2015 um 23:57 schrieb jay vyas jayunit100.apa...@gmail.com:
 
  You can easily share the artifacts with a docker shared volume
 
  in the container EXPORT M2_HOME=/container/m2/
 
  follwed by
 
  docker build -v ~/.m2/ /container/m2/  
 
  This will put the mvn jars into the host rather than the guest
 conatainer, so that they persist.
 
 

 Thats not the point. Containers are not stateless any more.

 Olaf





Re: Rebooting the conversation on the Future of bigtop: Abstracting the backplane ? Containers?

2015-06-20 Thread Bruno Mahé
Echoing both Nate and Evans, I would not limit ourselves based on the 
technology used for the build.


However, I am not sure to completely follow option 3. We are doing that 
already for packages. For instance if package A depends on Apache 
Zookeeper., then the package A does depend on Apache Zookeeper and 
includes symlinks to the Apache Zookeeper library provided by the Apache 
Zookeeper package.



Thanks,
Bruno


On 06/19/2015 12:47 PM, n...@reactor8.com wrote:


Echoing Evans, think we should not be worried about stateless vs 
non-stateless containers.., seems core idea and need to is optimize 
the build process and maximize re-use whether on host or container 
machines or build environments.


Added sub-task with Olaf’s idea to Evans umbrella CI task, currently 
marked it for 1.1:


https://issues.apache.org/jira/browse/BIGTOP-1906

*From:*Evans Ye [mailto:evan...@apache.org]
*Sent:* Friday, June 19, 2015 7:16 AM
*To:* user@bigtop.apache.org
*Subject:* Re: Rebooting the conversation on the Future of bigtop: 
Abstracting the backplane ? Containers?


I thnk it's not a problem that container is not stateless. No matter 
how we should have CI jobs that builds all the artifacts and store 
them as official repos.
You point out an important thing that is the mvn install is the key 
feature to propergate self patched components around. If we disable 
this than there's no reason to build jars by ourselves. I'm +1 to 
option 2.


2015年6月19日 上午5:59於 Olaf Flebbe o...@oflebbe.de 
mailto:o...@oflebbe.de寫道:



 Am 18.06.2015 um 23:57 schrieb jay vyas
jayunit100.apa...@gmail.com mailto:jayunit100.apa...@gmail.com:

 You can easily share the artifacts with a docker shared volume

 in the container EXPORT M2_HOME=/container/m2/

 follwed by

 docker build -v ~/.m2/ /container/m2/  

 This will put the mvn jars into the host rather than the guest
conatainer, so that they persist.



Thats not the point. Containers are not stateless any more.

Olaf





Re: Rebooting the conversation on the Future of bigtop: Abstracting the backplane ? Containers?

2015-06-19 Thread Evans Ye
I thnk it's not a problem that container is not stateless. No matter how we
should have CI jobs that builds all the artifacts and store them as
official repos.
You point out an important thing that is the mvn install is the key feature
to propergate self patched components around. If we disable this than
there's no reason to build jars by ourselves. I'm +1 to option 2.
2015年6月19日 上午5:59於 Olaf Flebbe o...@oflebbe.de寫道:


  Am 18.06.2015 um 23:57 schrieb jay vyas jayunit100.apa...@gmail.com:
 
  You can easily share the artifacts with a docker shared volume
 
  in the container EXPORT M2_HOME=/container/m2/
 
  follwed by
 
  docker build -v ~/.m2/ /container/m2/  
 
  This will put the mvn jars into the host rather than the guest
 conatainer, so that they persist.
 
 

 Thats not the point. Containers are not stateless any more.

 Olaf



RE: Rebooting the conversation on the Future of bigtop: Abstracting the backplane ? Containers?

2015-06-19 Thread nate
Echoing Evans, think we should not be worried about stateless vs non-stateless 
containers.., seems core idea and need to is optimize the build process and 
maximize re-use whether on host or container machines or build environments.

 

Added sub-task with Olaf’s idea to Evans umbrella CI task, currently marked it 
for 1.1:

 

https://issues.apache.org/jira/browse/BIGTOP-1906

 

 

 

From: Evans Ye [mailto:evan...@apache.org] 
Sent: Friday, June 19, 2015 7:16 AM
To: user@bigtop.apache.org
Subject: Re: Rebooting the conversation on the Future of bigtop: Abstracting 
the backplane ? Containers?

 

I thnk it's not a problem that container is not stateless. No matter how we 
should have CI jobs that builds all the artifacts and store them as official 
repos. 
You point out an important thing that is the mvn install is the key feature to 
propergate self patched components around. If we disable this than there's no 
reason to build jars by ourselves. I'm +1 to option 2.

2015年6月19日 上午5:59於 Olaf Flebbe o...@oflebbe.de mailto:o...@oflebbe.de 寫道:


 Am 18.06.2015 um 23:57 schrieb jay vyas jayunit100.apa...@gmail.com 
 mailto:jayunit100.apa...@gmail.com :

 You can easily share the artifacts with a docker shared volume

 in the container EXPORT M2_HOME=/container/m2/

 follwed by

 docker build -v ~/.m2/ /container/m2/  

 This will put the mvn jars into the host rather than the guest conatainer, so 
 that they persist.



Thats not the point. Containers are not stateless any more.

Olaf



Re: Rebooting the conversation on the Future of bigtop: Abstracting the backplane ? Containers?

2015-06-18 Thread jay vyas
You can easily share the artifacts with a docker shared volume

in the container EXPORT M2_HOME=/container/m2/

follwed by

docker build -v ~/.m2/ /container/m2/  

This will put the mvn jars into the host rather than the guest conatainer,
so that they persist.




On Thu, Jun 18, 2015 at 5:32 PM, Olaf Flebbe o...@oflebbe.de wrote:

 Thanks Nate

 for this focused writeup!

 Yeah maybe it is time to reboot our brains ...

 Additionaly to the points of nate I would like to attack this in bigtop
 1.1.0:

 ..
 Building from source or downloading ?
 …

 However we have a substancial problem hidden deep in th CI „2.0“ approach
 using containers

 You may know that we place artifacts (i.e. jars) we built with bigtop into
 the local maven cache ~/.m2. (look for mvn install in do-component-build).
 The idea is that later maven builds will pick these artifacts and use them
 rather downloading them from maven central.

 Placing artifacts into ~/.m2 will not have any effect if we use CI
 containers the way we do now: The maven cache ~/.m2 is lost when the
 container ends.

 [This triggered misfeature in JIRA BIGTOP-1893, BTW:  gradle rpm/apt
 behaved differently from a container build with artifacts from maven
 central.]

 Option 1)  Remove mvn install from all do-component-builds

 Results:

 + We compile projects the way the upstream-developer does.
 - local fixes and configurations will not propagated

 Questions:
 If we do not try to reuse our build-artifacts within compile we have to
 ask ourself why do we compile projects at all?“.

 We can build a great test wether someone else has touched / manipulated
 the maven central cache if we compare artifacts, but is this the really the
 point of compiling ourselves?


 Option 2) Use mvn install and reuse artifacts even in containers.

 Consequences:

 - Containers are not stateless any more

 - We have to add depencies to CI jobs so they run in order

 - single components may break the whole compile process.

 - Compile does not scale any more

 My Opinion:
 The way we do now mvn install“ ,  simply tainting the maven cache seems
 not a really controlled way to propagate artifacts to me.

 Option 3) Use 1) but reuse artifacts in packages by placing symlinks and
 dependencies between them.

 - Packages will break with subtile problems if we do symlink artifacts
 from different releases.

 
 Neither Option 1, Option 2 nor Option 3 seems a clever way to fix the
 problem. Would like to hear comments regarding this issue:


 In my humble opinion we should follow Option 2 with all the grave
 consequences. But maybe reworking mvn install by placing the artifacts with
 a bigtop specific name / groupid into the maven cache and upload them to
 maven central .

 Olaf










  Am 18.06.2015 um 08:26 schrieb n...@reactor8.com:
 
  Building on conversations pre/during/post Apachecon and looking at the
 post 1.0 bigtop focus and efforts, want to lay out a few things, get
 peoples comments.  Seems to be some consensus that the project can look
 towards serving end application/data developers more going forward, while
 continuing the tradition of the projects build/pkg/test/deploy roots.
 
  I have spent the past couple months, and heavily the past 3 or so weeks,
 talking to many different potential end users at meetups, conferences,
 etc.., also having some great conversations with commercial open source
 vendors that are interested in what a future bigtop can be and what it
 could provide to users.
 
  I believe we need to put some focused effort into few foundational
 things to put the project in a position to move faster and attract a wider
 range of users as well as new contributors.
 
  ---
  CI 2.0
  ---
 
  Start of this is already underway based on the work roman started last
 year and continuing effort with new setup and enhancement on bigtop AWS
 infrastructure, Evans has been pushing this along into the 1.0 release.
 Speed of getting new packages built and up to date needs to increase so
 releases can happen at a regular clip.., even looking towards user friendly
 ad-hoc bigtop builds where users could quickly choose the 2,3,4,etc
 components they want and have a stack around that.
 
  Related to this, hoping the group can come to some idea/agreement on
 some semver style versioning for the project post 1.0.  I think this could
 set a path forward for releases that can happen faster, while not holding
 up the whole train if a single smaller component has a couple issues that
 cant/wont be resolved by the main stakeholders or interested parties in
 said component.  An example might be new pig or sqoop having issues.., the
 1.2 release would still go out the door with 1.2.1 coming days/weeks later
 once new pig or sqoop was fixed up.
 
  -
  Proper package repository hosting
  -
 
  I put together a little test setup based on the 0.8 assets, we can
 

Re: Rebooting the conversation on the Future of bigtop: Abstracting the backplane ? Containers?

2015-06-17 Thread Evans Ye
I personally like the idea of including mesos, docker stuff as
options we provided to users. No doubt that these solutions can gain more
eyeballs and attract new users/contributors to the bigtop family.
As mentioned above, the use case seems still not clear yet, but there might
be a chance that people will start to adopt them because of bigtop's
support.

Comming back to earth, to be honest I don't need thing like mesos, too. So
including it would be a little bit difficaut since there's no real demand
on it so far. Probably people who is intrested in the technology can start
with an alpha/experimental feature and see if lots of poeple has intrest on
it. The feature should be able to push in to our code base as long as
there's a maintainer who committed to maintain it. I can pair program with
Jay if you'd like to work on it. :)
 2015年6月17日 上午3:02於 Andrew Purtell apurt...@apache.org寫道:

  thanks andy - i agree with most of your opinions around continuing to
 build
 standard packages.. but can you clarify what was offensive ?  must be a
 misinterpretation somewhere.

 Sure.

 A bit offensive.

 gridgain or spark can do what 90% of the hadoop ecosystem already does,
 supporting streams, batch,sql all in one - This statement deprecates the
 utility of the labors of rest of the Hadoop ecosystem in favor of Gridgain
 and Spark. As a gross generalization it's unlikely to be a helpful
 statement in any case.

 It's fine if we all have our favorites, of course. I think we're set up
 well to empirically determine winners and losers, we don't need to make
 partisan statements. Those components that get some user interest in the
 form of contributions that keep them building and happy in Bigtop will stay
 in. Those that do not get the necessary attention will have to be culled
 out over time when and if they fail to compile or pass integration tests.


 On Mon, Jun 15, 2015 at 11:42 AM, jay vyas jayunit100.apa...@gmail.com
 wrote:

 thanks andy - i agree with most of your opinions around continuing to
 build
 standard packages.. but can you clarify what was offensive ?  must be a
 misinterpretation somewhere.

 1) To be clear, i am 100% behind supporting standard hadoop build rpms
 that
 we have now.   Thats the core product and will be for  the forseeable
 future, absolutely !

 2) The idea (and its just an idea i want to throw out - to keep us on our
 toes), is that some folks may be interested in hacking around, in a
 separate branch - on some bleeding edge bigdata deployments - which
 attempts to incorporate resource managers and  containers as first-class
 citizens.

 Again this is all just ideas - not in any way meant to derail the
 packaging
 efforts - but rather - just to gauge folks interest level in the bleeding
 edge, docker, mesos, simplified  processing stacks, and so on.



 On Mon, Jun 15, 2015 at 12:39 PM, Andrew Purtell apurt...@apache.org
 wrote:

   gridgain or spark can do what 90% of the hadoop ecosystem already
 does,
  supporting streams, batch,sql all in one)
 
  If something like this becomes the official position of the Bigtop
  project, some day, then it will turn off people. I can see where you are
  coming from, I think. Correct me if I'm wrong: We have limited
 bandwidth,
  we should move away from Roman et. al.'s vision of Bigtop as an
 inclusive
  distribution of big data packages, and instead become highly opinionated
  and tightly focused. If that's accurate, I can sum up my concern as
  follows: To the degree we become more opinionated, the less we may have
 to
  look at in terms of inclusion - both software and user communities. For
  example, I find the above quoted statement a bit offensive as a
 participant
  on not-Spark and not-Gridgain projects. I roll my eyes sometimes at the
  Docker over-hype. Is there still a place for me here?
 
 
 
  On Mon, Jun 15, 2015 at 9:22 AM, jay vyas jayunit100.apa...@gmail.com
  wrote:
 
  Hi folks.   Every few months, i try to reboot the conversation about
 the
  next generation of bigtop.
 
  There are 3 things which i think we should consider : A backplane
 (rather
  than deploy to machines, the meaning of the term ecosystem in a
  post-spark in-memory apacolypse, and containerization.
 
  1) BACKPLANE: The new trend is to have a backplane that provides
  networking abstractions for you (mesos, kubernetes, yarn, and so on).
  Is
  it time for us to pick a resource manager?
 
  2) ECOSYSTEM?: Nowadays folks don't necessarily need the whole hadoop
  ecosystem, and there is a huge shift to in-memory, monolithic stacks
  happening (i.e. gridgain or spark can do what 90% of the hadoop
 ecosystem
  already does, supporting streams, batch,sql all in one).
 
  3) CONTAINERS:  we are doing a great job w/ docker in our build infra.
  Is it time to start experimenting with running docker tarballs ?
 
  Combining 1+2+3 - i could see a useful bigdata upstream distro which
 (1)
  just installed an HCFS implementation (gluster,HDFS,...) along side,
 say,
  

Re: Rebooting the conversation on the Future of bigtop: Abstracting the backplane ? Containers?

2015-06-16 Thread Jay Vyas
Your right Bruno , I could, but I have no need of such a thing:)  

And in any case --- this thread is just about sharing ideas, letting the whole 
community speak up about their opinions on the future of bigtop. it's not about 
driving a particular project direction. 

Bigtop is a unique project in that we integrate a lot of tools in a rapidly 
changing landscape, so it's good to have some feelers out there to see what our 
users are thinking.  

Thanks all for the feedback, hope to get more!
 
 On Jun 16, 2015, at 2:11 AM, Bruno Mahé bm...@apache.org wrote:
 
 On 06/15/2015 09:22 AM, jay vyas wrote:
 Hi folks.   Every few months, i try to reboot the conversation about the 
 next generation of bigtop.
 
 There are 3 things which i think we should consider : A backplane (rather 
 than deploy to machines, the meaning of the term ecosystem in a post-spark 
 in-memory apacolypse, and containerization.
 
 1) BACKPLANE: The new trend is to have a backplane that provides networking 
 abstractions for you (mesos, kubernetes, yarn, and so on).   Is it time for 
 us to pick a resource manager?
 
 2) ECOSYSTEM?: Nowadays folks don't necessarily need the whole hadoop 
 ecosystem, and there is a huge shift to in-memory, monolithic stacks 
 happening (i.e. gridgain or spark can do what 90% of the hadoop ecosystem 
 already does, supporting streams, batch,sql all in one).
 
 3) CONTAINERS:  we are doing a great job w/ docker in our build infra.  Is 
 it time to start experimenting with running docker tarballs ?
 
 Combining 1+2+3 - i could see a useful bigdata upstream distro which (1) 
 just installed an HCFS implementation (gluster,HDFS,...) along side, say, 
 (2) mesos as a backplane for the tooling for [[ hbase + spark + ignite ]] 
 --- and then (3) do the integration testing of available mesos-framework 
 plugins for ignite and spark underneath.  If other folks are interested, 
 maybe we could create the 1x or in-memory branch to start hacking on it 
 sometime ?Maybe even bring the flink guys in as well, as they are 
 interested in bigtop packaging.
 
 
 
 -- 
 jay vyas
 
 
 I have roughly the same position as Andrew on that matter.
 
 What prevents you from starting something yourself to start hacking on it?
 
 
 Thanks,
 Bruno


Re: Rebooting the conversation on the Future of bigtop: Abstracting the backplane ? Containers?

2015-06-16 Thread Andrew Purtell
 thanks andy - i agree with most of your opinions around continuing to
build
standard packages.. but can you clarify what was offensive ?  must be a
misinterpretation somewhere.

Sure.

A bit offensive.

gridgain or spark can do what 90% of the hadoop ecosystem already does,
supporting streams, batch,sql all in one - This statement deprecates the
utility of the labors of rest of the Hadoop ecosystem in favor of Gridgain
and Spark. As a gross generalization it's unlikely to be a helpful
statement in any case.

It's fine if we all have our favorites, of course. I think we're set up
well to empirically determine winners and losers, we don't need to make
partisan statements. Those components that get some user interest in the
form of contributions that keep them building and happy in Bigtop will stay
in. Those that do not get the necessary attention will have to be culled
out over time when and if they fail to compile or pass integration tests.


On Mon, Jun 15, 2015 at 11:42 AM, jay vyas jayunit100.apa...@gmail.com
wrote:

 thanks andy - i agree with most of your opinions around continuing to build
 standard packages.. but can you clarify what was offensive ?  must be a
 misinterpretation somewhere.

 1) To be clear, i am 100% behind supporting standard hadoop build rpms that
 we have now.   Thats the core product and will be for  the forseeable
 future, absolutely !

 2) The idea (and its just an idea i want to throw out - to keep us on our
 toes), is that some folks may be interested in hacking around, in a
 separate branch - on some bleeding edge bigdata deployments - which
 attempts to incorporate resource managers and  containers as first-class
 citizens.

 Again this is all just ideas - not in any way meant to derail the packaging
 efforts - but rather - just to gauge folks interest level in the bleeding
 edge, docker, mesos, simplified  processing stacks, and so on.



 On Mon, Jun 15, 2015 at 12:39 PM, Andrew Purtell apurt...@apache.org
 wrote:

   gridgain or spark can do what 90% of the hadoop ecosystem already does,
  supporting streams, batch,sql all in one)
 
  If something like this becomes the official position of the Bigtop
  project, some day, then it will turn off people. I can see where you are
  coming from, I think. Correct me if I'm wrong: We have limited bandwidth,
  we should move away from Roman et. al.'s vision of Bigtop as an inclusive
  distribution of big data packages, and instead become highly opinionated
  and tightly focused. If that's accurate, I can sum up my concern as
  follows: To the degree we become more opinionated, the less we may have
 to
  look at in terms of inclusion - both software and user communities. For
  example, I find the above quoted statement a bit offensive as a
 participant
  on not-Spark and not-Gridgain projects. I roll my eyes sometimes at the
  Docker over-hype. Is there still a place for me here?
 
 
 
  On Mon, Jun 15, 2015 at 9:22 AM, jay vyas jayunit100.apa...@gmail.com
  wrote:
 
  Hi folks.   Every few months, i try to reboot the conversation about the
  next generation of bigtop.
 
  There are 3 things which i think we should consider : A backplane
 (rather
  than deploy to machines, the meaning of the term ecosystem in a
  post-spark in-memory apacolypse, and containerization.
 
  1) BACKPLANE: The new trend is to have a backplane that provides
  networking abstractions for you (mesos, kubernetes, yarn, and so on).
  Is
  it time for us to pick a resource manager?
 
  2) ECOSYSTEM?: Nowadays folks don't necessarily need the whole hadoop
  ecosystem, and there is a huge shift to in-memory, monolithic stacks
  happening (i.e. gridgain or spark can do what 90% of the hadoop
 ecosystem
  already does, supporting streams, batch,sql all in one).
 
  3) CONTAINERS:  we are doing a great job w/ docker in our build infra.
  Is it time to start experimenting with running docker tarballs ?
 
  Combining 1+2+3 - i could see a useful bigdata upstream distro which (1)
  just installed an HCFS implementation (gluster,HDFS,...) along side,
 say,
  (2) mesos as a backplane for the tooling for [[ hbase + spark + ignite
 ]]
  --- and then (3) do the integration testing of available mesos-framework
  plugins for ignite and spark underneath.  If other folks are interested,
  maybe we could create the 1x or in-memory branch to start hacking
 on it
  sometime ?Maybe even bring the flink guys in as well, as they are
  interested in bigtop packaging.
 
 
 
  --
  jay vyas
 
 
 
 
  --
  Best regards,
 
 - Andy
 
  Problems worthy of attack prove their worth by hitting back. - Piet Hein
  (via Tom White)
 



 --
 jay vyas




-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)


Re: Rebooting the conversation on the Future of bigtop: Abstracting the backplane ? Containers?

2015-06-16 Thread Bruno Mahé

On 06/15/2015 09:22 AM, jay vyas wrote:
Hi folks.   Every few months, i try to reboot the conversation about 
the next generation of bigtop.


There are 3 things which i think we should consider : A backplane 
(rather than deploy to machines, the meaning of the term ecosystem 
in a post-spark in-memory apacolypse, and containerization.


1) BACKPLANE: The new trend is to have a backplane that provides 
networking abstractions for you (mesos, kubernetes, yarn, and so 
on).   Is it time for us to pick a resource manager?


2) ECOSYSTEM?: Nowadays folks don't necessarily need the whole hadoop 
ecosystem, and there is a huge shift to in-memory, monolithic stacks 
happening (i.e. gridgain or spark can do what 90% of the hadoop 
ecosystem already does, supporting streams, batch,sql all in one).


3) CONTAINERS:  we are doing a great job w/ docker in our build 
infra.  Is it time to start experimenting with running docker tarballs ?


Combining 1+2+3 - i could see a useful bigdata upstream distro which 
(1) just installed an HCFS implementation (gluster,HDFS,...) along 
side, say, (2) mesos as a backplane for the tooling for [[ hbase + 
spark + ignite ]] --- and then (3) do the integration testing of 
available mesos-framework plugins for ignite and spark underneath.  If 
other folks are interested, maybe we could create the 1x or 
in-memory branch to start hacking on it sometime ?Maybe even 
bring the flink guys in as well, as they are interested in bigtop 
packaging.




--
jay vyas



I have roughly the same position as Andrew on that matter.

What prevents you from starting something yourself to start hacking on it?


Thanks,
Bruno


Re: Rebooting the conversation on the Future of bigtop: Abstracting the backplane ? Containers?

2015-06-15 Thread jay vyas
thanks andy - i agree with most of your opinions around continuing to build
standard packages.. but can you clarify what was offensive ?  must be a
misinterpretation somewhere.

1) To be clear, i am 100% behind supporting standard hadoop build rpms that
we have now.   Thats the core product and will be for  the forseeable
future, absolutely !

2) The idea (and its just an idea i want to throw out - to keep us on our
toes), is that some folks may be interested in hacking around, in a
separate branch - on some bleeding edge bigdata deployments - which
attempts to incorporate resource managers and  containers as first-class
citizens.

Again this is all just ideas - not in any way meant to derail the packaging
efforts - but rather - just to gauge folks interest level in the bleeding
edge, docker, mesos, simplified  processing stacks, and so on.



On Mon, Jun 15, 2015 at 12:39 PM, Andrew Purtell apurt...@apache.org
wrote:

  gridgain or spark can do what 90% of the hadoop ecosystem already does,
 supporting streams, batch,sql all in one)

 If something like this becomes the official position of the Bigtop
 project, some day, then it will turn off people. I can see where you are
 coming from, I think. Correct me if I'm wrong: We have limited bandwidth,
 we should move away from Roman et. al.'s vision of Bigtop as an inclusive
 distribution of big data packages, and instead become highly opinionated
 and tightly focused. If that's accurate, I can sum up my concern as
 follows: To the degree we become more opinionated, the less we may have to
 look at in terms of inclusion - both software and user communities. For
 example, I find the above quoted statement a bit offensive as a participant
 on not-Spark and not-Gridgain projects. I roll my eyes sometimes at the
 Docker over-hype. Is there still a place for me here?



 On Mon, Jun 15, 2015 at 9:22 AM, jay vyas jayunit100.apa...@gmail.com
 wrote:

 Hi folks.   Every few months, i try to reboot the conversation about the
 next generation of bigtop.

 There are 3 things which i think we should consider : A backplane (rather
 than deploy to machines, the meaning of the term ecosystem in a
 post-spark in-memory apacolypse, and containerization.

 1) BACKPLANE: The new trend is to have a backplane that provides
 networking abstractions for you (mesos, kubernetes, yarn, and so on).   Is
 it time for us to pick a resource manager?

 2) ECOSYSTEM?: Nowadays folks don't necessarily need the whole hadoop
 ecosystem, and there is a huge shift to in-memory, monolithic stacks
 happening (i.e. gridgain or spark can do what 90% of the hadoop ecosystem
 already does, supporting streams, batch,sql all in one).

 3) CONTAINERS:  we are doing a great job w/ docker in our build infra.
 Is it time to start experimenting with running docker tarballs ?

 Combining 1+2+3 - i could see a useful bigdata upstream distro which (1)
 just installed an HCFS implementation (gluster,HDFS,...) along side, say,
 (2) mesos as a backplane for the tooling for [[ hbase + spark + ignite ]]
 --- and then (3) do the integration testing of available mesos-framework
 plugins for ignite and spark underneath.  If other folks are interested,
 maybe we could create the 1x or in-memory branch to start hacking on it
 sometime ?Maybe even bring the flink guys in as well, as they are
 interested in bigtop packaging.



 --
 jay vyas




 --
 Best regards,

- Andy

 Problems worthy of attack prove their worth by hitting back. - Piet Hein
 (via Tom White)




-- 
jay vyas


Re: Rebooting the conversation on the Future of bigtop: Abstracting the backplane ? Containers?

2015-06-15 Thread Andrew Purtell
 Is it time for us to pick a resource manager?

Not if we want to be like a Debian for big data software.  I'm not sure we
want to limit our reach by being overly opinionated. With my user's hat on,
if we don't package Hadoop and YARN, then I wouldn't have any use for
Bigtop.

 Nowadays folks don't necessarily need the whole hadoop ecosystem, and
there is a huge shift to in-memory, monolithic stacks happening

A Bigtop user would only need to install the packages they would like to
use, right? Is this an argument for exclusion? Exclusion of what?

 Is it time to start experimenting with running docker tarballs ?

T​his sounds fine as an additional target for builds, but not if it leads
to a proposal to do away with
​ the​
OS native packaging.
​That's useful too. Containers are trendy but not useful or even
appropriate for every environment or use case.​
​ ​




On Mon, Jun 15, 2015 at 9:22 AM, jay vyas jayunit100.apa...@gmail.com
wrote:

 Hi folks.   Every few months, i try to reboot the conversation about the
 next generation of bigtop.

 There are 3 things which i think we should consider : A backplane (rather
 than deploy to machines, the meaning of the term ecosystem in a
 post-spark in-memory apacolypse, and containerization.

 1) BACKPLANE: The new trend is to have a backplane that provides
 networking abstractions for you (mesos, kubernetes, yarn, and so on).   Is
 it time for us to pick a resource manager?

 2) ECOSYSTEM?: Nowadays folks don't necessarily need the whole hadoop
 ecosystem, and there is a huge shift to in-memory, monolithic stacks
 happening (i.e. gridgain or spark can do what 90% of the hadoop ecosystem
 already does, supporting streams, batch,sql all in one).

 3) CONTAINERS:  we are doing a great job w/ docker in our build infra.  Is
 it time to start experimenting with running docker tarballs ?

 Combining 1+2+3 - i could see a useful bigdata upstream distro which (1)
 just installed an HCFS implementation (gluster,HDFS,...) along side, say,
 (2) mesos as a backplane for the tooling for [[ hbase + spark + ignite ]]
 --- and then (3) do the integration testing of available mesos-framework
 plugins for ignite and spark underneath.  If other folks are interested,
 maybe we could create the 1x or in-memory branch to start hacking on it
 sometime ?Maybe even bring the flink guys in as well, as they are
 interested in bigtop packaging.



 --
 jay vyas




-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)


Re: Rebooting the conversation on the Future of bigtop: Abstracting the backplane ? Containers?

2015-06-15 Thread Roman Shaposhnik
On Mon, Jun 15, 2015 at 9:22 AM, jay vyas jayunit100.apa...@gmail.com wrote:
 Hi folks.   Every few months, i try to reboot the conversation about the
 next generation of bigtop.

 There are 3 things which i think we should consider : A backplane (rather
 than deploy to machines, the meaning of the term ecosystem in a post-spark
 in-memory apacolypse, and containerization.

 1) BACKPLANE: The new trend is to have a backplane that provides networking
 abstractions for you (mesos, kubernetes, yarn, and so on).   Is it time for
 us to pick a resource manager?

Let me rephrase the above and see if we're talking about the same thing. To
me your question is really about what does a datacenter look like to Bigtop.
Today a datacenter looks to Bigtop as a bunch of individual nodes running
some kind of a Linux distribution. What you seem to be asking is that whether
it is time for us to embrace the vision of a datacenter that looks like mesos,
etc. Correct?

Also, I don't think you're suggesting that we drop the bread-n-butter of Bigtop,
but I still need to make sure.

 2) ECOSYSTEM?: Nowadays folks don't necessarily need the whole hadoop
 ecosystem, and there is a huge shift to in-memory, monolithic stacks
 happening (i.e. gridgain or spark can do what 90% of the hadoop ecosystem
 already does, supporting streams, batch,sql all in one).

Correct. That said, I'm not sure what it means for Bigtop.

 3) CONTAINERS:  we are doing a great job w/ docker in our build infra.  Is
 it time to start experimenting with running docker tarballs ?

I think it is time, but

 Combining 1+2+3 - i could see a useful bigdata upstream distro which (1)
 just installed an HCFS implementation (gluster,HDFS,...) along side, say,
 (2) mesos as a backplane for the tooling for [[ hbase + spark + ignite ]]
 --- and then (3) do the integration testing of available mesos-framework
 plugins for ignite and spark underneath.  If other folks are interested,
 maybe we could create the 1x or in-memory branch to start hacking on it
 sometime ?Maybe even bring the flink guys in as well, as they are
 interested in bigtop packaging.

I'm actually very curious about use cases that folks might have around
traditional Hadoop Distributions. What you're articulating above seems
like one of those use cases, but at this point I'm sort of lost as to
what's the most common use case.

Thanks,
Roman.