Re: Rebooting the conversation on the Future of bigtop: Abstracting the backplane ? Containers?

2015-06-16 Thread Jay Vyas
Your right Bruno , I could, but I have no need of such a thing:)  

And in any case --- this thread is just about sharing ideas, letting the whole 
community speak up about their opinions on the future of bigtop. it's not about 
driving a particular project direction. 

Bigtop is a unique project in that we integrate a lot of tools in a rapidly 
changing landscape, so it's good to have some feelers out there to see what our 
users are thinking.  

Thanks all for the feedback, hope to get more!
 
 On Jun 16, 2015, at 2:11 AM, Bruno Mahé bm...@apache.org wrote:
 
 On 06/15/2015 09:22 AM, jay vyas wrote:
 Hi folks.   Every few months, i try to reboot the conversation about the 
 next generation of bigtop.
 
 There are 3 things which i think we should consider : A backplane (rather 
 than deploy to machines, the meaning of the term ecosystem in a post-spark 
 in-memory apacolypse, and containerization.
 
 1) BACKPLANE: The new trend is to have a backplane that provides networking 
 abstractions for you (mesos, kubernetes, yarn, and so on).   Is it time for 
 us to pick a resource manager?
 
 2) ECOSYSTEM?: Nowadays folks don't necessarily need the whole hadoop 
 ecosystem, and there is a huge shift to in-memory, monolithic stacks 
 happening (i.e. gridgain or spark can do what 90% of the hadoop ecosystem 
 already does, supporting streams, batch,sql all in one).
 
 3) CONTAINERS:  we are doing a great job w/ docker in our build infra.  Is 
 it time to start experimenting with running docker tarballs ?
 
 Combining 1+2+3 - i could see a useful bigdata upstream distro which (1) 
 just installed an HCFS implementation (gluster,HDFS,...) along side, say, 
 (2) mesos as a backplane for the tooling for [[ hbase + spark + ignite ]] 
 --- and then (3) do the integration testing of available mesos-framework 
 plugins for ignite and spark underneath.  If other folks are interested, 
 maybe we could create the 1x or in-memory branch to start hacking on it 
 sometime ?Maybe even bring the flink guys in as well, as they are 
 interested in bigtop packaging.
 
 
 
 -- 
 jay vyas
 
 
 I have roughly the same position as Andrew on that matter.
 
 What prevents you from starting something yourself to start hacking on it?
 
 
 Thanks,
 Bruno


Re: Rebooting the conversation on the Future of bigtop: Abstracting the backplane ? Containers?

2015-06-16 Thread Andrew Purtell
 thanks andy - i agree with most of your opinions around continuing to
build
standard packages.. but can you clarify what was offensive ?  must be a
misinterpretation somewhere.

Sure.

A bit offensive.

gridgain or spark can do what 90% of the hadoop ecosystem already does,
supporting streams, batch,sql all in one - This statement deprecates the
utility of the labors of rest of the Hadoop ecosystem in favor of Gridgain
and Spark. As a gross generalization it's unlikely to be a helpful
statement in any case.

It's fine if we all have our favorites, of course. I think we're set up
well to empirically determine winners and losers, we don't need to make
partisan statements. Those components that get some user interest in the
form of contributions that keep them building and happy in Bigtop will stay
in. Those that do not get the necessary attention will have to be culled
out over time when and if they fail to compile or pass integration tests.


On Mon, Jun 15, 2015 at 11:42 AM, jay vyas jayunit100.apa...@gmail.com
wrote:

 thanks andy - i agree with most of your opinions around continuing to build
 standard packages.. but can you clarify what was offensive ?  must be a
 misinterpretation somewhere.

 1) To be clear, i am 100% behind supporting standard hadoop build rpms that
 we have now.   Thats the core product and will be for  the forseeable
 future, absolutely !

 2) The idea (and its just an idea i want to throw out - to keep us on our
 toes), is that some folks may be interested in hacking around, in a
 separate branch - on some bleeding edge bigdata deployments - which
 attempts to incorporate resource managers and  containers as first-class
 citizens.

 Again this is all just ideas - not in any way meant to derail the packaging
 efforts - but rather - just to gauge folks interest level in the bleeding
 edge, docker, mesos, simplified  processing stacks, and so on.



 On Mon, Jun 15, 2015 at 12:39 PM, Andrew Purtell apurt...@apache.org
 wrote:

   gridgain or spark can do what 90% of the hadoop ecosystem already does,
  supporting streams, batch,sql all in one)
 
  If something like this becomes the official position of the Bigtop
  project, some day, then it will turn off people. I can see where you are
  coming from, I think. Correct me if I'm wrong: We have limited bandwidth,
  we should move away from Roman et. al.'s vision of Bigtop as an inclusive
  distribution of big data packages, and instead become highly opinionated
  and tightly focused. If that's accurate, I can sum up my concern as
  follows: To the degree we become more opinionated, the less we may have
 to
  look at in terms of inclusion - both software and user communities. For
  example, I find the above quoted statement a bit offensive as a
 participant
  on not-Spark and not-Gridgain projects. I roll my eyes sometimes at the
  Docker over-hype. Is there still a place for me here?
 
 
 
  On Mon, Jun 15, 2015 at 9:22 AM, jay vyas jayunit100.apa...@gmail.com
  wrote:
 
  Hi folks.   Every few months, i try to reboot the conversation about the
  next generation of bigtop.
 
  There are 3 things which i think we should consider : A backplane
 (rather
  than deploy to machines, the meaning of the term ecosystem in a
  post-spark in-memory apacolypse, and containerization.
 
  1) BACKPLANE: The new trend is to have a backplane that provides
  networking abstractions for you (mesos, kubernetes, yarn, and so on).
  Is
  it time for us to pick a resource manager?
 
  2) ECOSYSTEM?: Nowadays folks don't necessarily need the whole hadoop
  ecosystem, and there is a huge shift to in-memory, monolithic stacks
  happening (i.e. gridgain or spark can do what 90% of the hadoop
 ecosystem
  already does, supporting streams, batch,sql all in one).
 
  3) CONTAINERS:  we are doing a great job w/ docker in our build infra.
  Is it time to start experimenting with running docker tarballs ?
 
  Combining 1+2+3 - i could see a useful bigdata upstream distro which (1)
  just installed an HCFS implementation (gluster,HDFS,...) along side,
 say,
  (2) mesos as a backplane for the tooling for [[ hbase + spark + ignite
 ]]
  --- and then (3) do the integration testing of available mesos-framework
  plugins for ignite and spark underneath.  If other folks are interested,
  maybe we could create the 1x or in-memory branch to start hacking
 on it
  sometime ?Maybe even bring the flink guys in as well, as they are
  interested in bigtop packaging.
 
 
 
  --
  jay vyas
 
 
 
 
  --
  Best regards,
 
 - Andy
 
  Problems worthy of attack prove their worth by hitting back. - Piet Hein
  (via Tom White)
 



 --
 jay vyas




-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)


Re: Rebooting the conversation on the Future of bigtop: Abstracting the backplane ? Containers?

2015-06-16 Thread Bruno Mahé

On 06/15/2015 09:22 AM, jay vyas wrote:
Hi folks.   Every few months, i try to reboot the conversation about 
the next generation of bigtop.


There are 3 things which i think we should consider : A backplane 
(rather than deploy to machines, the meaning of the term ecosystem 
in a post-spark in-memory apacolypse, and containerization.


1) BACKPLANE: The new trend is to have a backplane that provides 
networking abstractions for you (mesos, kubernetes, yarn, and so 
on).   Is it time for us to pick a resource manager?


2) ECOSYSTEM?: Nowadays folks don't necessarily need the whole hadoop 
ecosystem, and there is a huge shift to in-memory, monolithic stacks 
happening (i.e. gridgain or spark can do what 90% of the hadoop 
ecosystem already does, supporting streams, batch,sql all in one).


3) CONTAINERS:  we are doing a great job w/ docker in our build 
infra.  Is it time to start experimenting with running docker tarballs ?


Combining 1+2+3 - i could see a useful bigdata upstream distro which 
(1) just installed an HCFS implementation (gluster,HDFS,...) along 
side, say, (2) mesos as a backplane for the tooling for [[ hbase + 
spark + ignite ]] --- and then (3) do the integration testing of 
available mesos-framework plugins for ignite and spark underneath.  If 
other folks are interested, maybe we could create the 1x or 
in-memory branch to start hacking on it sometime ?Maybe even 
bring the flink guys in as well, as they are interested in bigtop 
packaging.




--
jay vyas



I have roughly the same position as Andrew on that matter.

What prevents you from starting something yourself to start hacking on it?


Thanks,
Bruno