Re: Rebooting the conversation on the Future of bigtop: Abstracting the backplane ? Containers?
For as long as we support building components from GitHub repository by SHA, we must support the local install steps in do-component-build scripts. Otherwise the result cannot be transitively consistent. We should not assume a Bigtop user will be building with a BOM full of conveniently already released artifacts in public Maven repos (see above), or even with direct access to public networks. It would be inconvenient but I could see an extra 'getting started' step of setting up a local Nexus or similar. However, when building from SHAs on a dev workstation the local Maven cache seems actually the best option. Alternatively, we can declare this use case is no longer supported. That would make me sad. Would also be great if we can continue supporting Bigtop package builds on build servers and developer workstations without requiring any specific containerization technology (or any containerization at all). Giving people an option to use Docker specific techniques is fine as long as it is totally optional. On Fri, Jun 19, 2015 at 11:26 PM, Bruno Mahé bm...@apache.org wrote: Echoing both Nate and Evans, I would not limit ourselves based on the technology used for the build. However, I am not sure to completely follow option 3. We are doing that already for packages. For instance if package A depends on Apache Zookeeper., then the package A does depend on Apache Zookeeper and includes symlinks to the Apache Zookeeper library provided by the Apache Zookeeper package. Thanks, Bruno On 06/19/2015 12:47 PM, n...@reactor8.com wrote: Echoing Evans, think we should not be worried about stateless vs non-stateless containers.., seems core idea and need to is optimize the build process and maximize re-use whether on host or container machines or build environments. Added sub-task with Olaf’s idea to Evans umbrella CI task, currently marked it for 1.1: https://issues.apache.org/jira/browse/BIGTOP-1906 *From:* Evans Ye [mailto:evan...@apache.org evan...@apache.org] *Sent:* Friday, June 19, 2015 7:16 AM *To:* user@bigtop.apache.org *Subject:* Re: Rebooting the conversation on the Future of bigtop: Abstracting the backplane ? Containers? I thnk it's not a problem that container is not stateless. No matter how we should have CI jobs that builds all the artifacts and store them as official repos. You point out an important thing that is the mvn install is the key feature to propergate self patched components around. If we disable this than there's no reason to build jars by ourselves. I'm +1 to option 2. 2015年6月19日 上午5:59於 Olaf Flebbe o...@oflebbe.de寫道: Am 18.06.2015 um 23:57 schrieb jay vyas jayunit100.apa...@gmail.com: You can easily share the artifacts with a docker shared volume in the container EXPORT M2_HOME=/container/m2/ follwed by docker build -v ~/.m2/ /container/m2/ This will put the mvn jars into the host rather than the guest conatainer, so that they persist. Thats not the point. Containers are not stateless any more. Olaf
Re: Rebooting the conversation on the Future of bigtop: Abstracting the backplane ? Containers?
Echoing both Nate and Evans, I would not limit ourselves based on the technology used for the build. However, I am not sure to completely follow option 3. We are doing that already for packages. For instance if package A depends on Apache Zookeeper., then the package A does depend on Apache Zookeeper and includes symlinks to the Apache Zookeeper library provided by the Apache Zookeeper package. Thanks, Bruno On 06/19/2015 12:47 PM, n...@reactor8.com wrote: Echoing Evans, think we should not be worried about stateless vs non-stateless containers.., seems core idea and need to is optimize the build process and maximize re-use whether on host or container machines or build environments. Added sub-task with Olaf’s idea to Evans umbrella CI task, currently marked it for 1.1: https://issues.apache.org/jira/browse/BIGTOP-1906 *From:*Evans Ye [mailto:evan...@apache.org] *Sent:* Friday, June 19, 2015 7:16 AM *To:* user@bigtop.apache.org *Subject:* Re: Rebooting the conversation on the Future of bigtop: Abstracting the backplane ? Containers? I thnk it's not a problem that container is not stateless. No matter how we should have CI jobs that builds all the artifacts and store them as official repos. You point out an important thing that is the mvn install is the key feature to propergate self patched components around. If we disable this than there's no reason to build jars by ourselves. I'm +1 to option 2. 2015年6月19日 上午5:59於 Olaf Flebbe o...@oflebbe.de mailto:o...@oflebbe.de寫道: Am 18.06.2015 um 23:57 schrieb jay vyas jayunit100.apa...@gmail.com mailto:jayunit100.apa...@gmail.com: You can easily share the artifacts with a docker shared volume in the container EXPORT M2_HOME=/container/m2/ follwed by docker build -v ~/.m2/ /container/m2/ This will put the mvn jars into the host rather than the guest conatainer, so that they persist. Thats not the point. Containers are not stateless any more. Olaf
Re: Rebooting the conversation on the Future of bigtop: Abstracting the backplane ? Containers?
I thnk it's not a problem that container is not stateless. No matter how we should have CI jobs that builds all the artifacts and store them as official repos. You point out an important thing that is the mvn install is the key feature to propergate self patched components around. If we disable this than there's no reason to build jars by ourselves. I'm +1 to option 2. 2015年6月19日 上午5:59於 Olaf Flebbe o...@oflebbe.de寫道: Am 18.06.2015 um 23:57 schrieb jay vyas jayunit100.apa...@gmail.com: You can easily share the artifacts with a docker shared volume in the container EXPORT M2_HOME=/container/m2/ follwed by docker build -v ~/.m2/ /container/m2/ This will put the mvn jars into the host rather than the guest conatainer, so that they persist. Thats not the point. Containers are not stateless any more. Olaf
RE: Rebooting the conversation on the Future of bigtop: Abstracting the backplane ? Containers?
Echoing Evans, think we should not be worried about stateless vs non-stateless containers.., seems core idea and need to is optimize the build process and maximize re-use whether on host or container machines or build environments. Added sub-task with Olaf’s idea to Evans umbrella CI task, currently marked it for 1.1: https://issues.apache.org/jira/browse/BIGTOP-1906 From: Evans Ye [mailto:evan...@apache.org] Sent: Friday, June 19, 2015 7:16 AM To: user@bigtop.apache.org Subject: Re: Rebooting the conversation on the Future of bigtop: Abstracting the backplane ? Containers? I thnk it's not a problem that container is not stateless. No matter how we should have CI jobs that builds all the artifacts and store them as official repos. You point out an important thing that is the mvn install is the key feature to propergate self patched components around. If we disable this than there's no reason to build jars by ourselves. I'm +1 to option 2. 2015年6月19日 上午5:59於 Olaf Flebbe o...@oflebbe.de mailto:o...@oflebbe.de 寫道: Am 18.06.2015 um 23:57 schrieb jay vyas jayunit100.apa...@gmail.com mailto:jayunit100.apa...@gmail.com : You can easily share the artifacts with a docker shared volume in the container EXPORT M2_HOME=/container/m2/ follwed by docker build -v ~/.m2/ /container/m2/ This will put the mvn jars into the host rather than the guest conatainer, so that they persist. Thats not the point. Containers are not stateless any more. Olaf
Re: Rebooting the conversation on the Future of bigtop: Abstracting the backplane ? Containers?
You can easily share the artifacts with a docker shared volume in the container EXPORT M2_HOME=/container/m2/ follwed by docker build -v ~/.m2/ /container/m2/ This will put the mvn jars into the host rather than the guest conatainer, so that they persist. On Thu, Jun 18, 2015 at 5:32 PM, Olaf Flebbe o...@oflebbe.de wrote: Thanks Nate for this focused writeup! Yeah maybe it is time to reboot our brains ... Additionaly to the points of nate I would like to attack this in bigtop 1.1.0: .. Building from source or downloading ? … However we have a substancial problem hidden deep in th CI „2.0“ approach using containers You may know that we place artifacts (i.e. jars) we built with bigtop into the local maven cache ~/.m2. (look for mvn install in do-component-build). The idea is that later maven builds will pick these artifacts and use them rather downloading them from maven central. Placing artifacts into ~/.m2 will not have any effect if we use CI containers the way we do now: The maven cache ~/.m2 is lost when the container ends. [This triggered misfeature in JIRA BIGTOP-1893, BTW: gradle rpm/apt behaved differently from a container build with artifacts from maven central.] Option 1) Remove mvn install from all do-component-builds Results: + We compile projects the way the upstream-developer does. - local fixes and configurations will not propagated Questions: If we do not try to reuse our build-artifacts within compile we have to ask ourself why do we compile projects at all?“. We can build a great test wether someone else has touched / manipulated the maven central cache if we compare artifacts, but is this the really the point of compiling ourselves? Option 2) Use mvn install and reuse artifacts even in containers. Consequences: - Containers are not stateless any more - We have to add depencies to CI jobs so they run in order - single components may break the whole compile process. - Compile does not scale any more My Opinion: The way we do now mvn install“ , simply tainting the maven cache seems not a really controlled way to propagate artifacts to me. Option 3) Use 1) but reuse artifacts in packages by placing symlinks and dependencies between them. - Packages will break with subtile problems if we do symlink artifacts from different releases. Neither Option 1, Option 2 nor Option 3 seems a clever way to fix the problem. Would like to hear comments regarding this issue: In my humble opinion we should follow Option 2 with all the grave consequences. But maybe reworking mvn install by placing the artifacts with a bigtop specific name / groupid into the maven cache and upload them to maven central . Olaf Am 18.06.2015 um 08:26 schrieb n...@reactor8.com: Building on conversations pre/during/post Apachecon and looking at the post 1.0 bigtop focus and efforts, want to lay out a few things, get peoples comments. Seems to be some consensus that the project can look towards serving end application/data developers more going forward, while continuing the tradition of the projects build/pkg/test/deploy roots. I have spent the past couple months, and heavily the past 3 or so weeks, talking to many different potential end users at meetups, conferences, etc.., also having some great conversations with commercial open source vendors that are interested in what a future bigtop can be and what it could provide to users. I believe we need to put some focused effort into few foundational things to put the project in a position to move faster and attract a wider range of users as well as new contributors. --- CI 2.0 --- Start of this is already underway based on the work roman started last year and continuing effort with new setup and enhancement on bigtop AWS infrastructure, Evans has been pushing this along into the 1.0 release. Speed of getting new packages built and up to date needs to increase so releases can happen at a regular clip.., even looking towards user friendly ad-hoc bigtop builds where users could quickly choose the 2,3,4,etc components they want and have a stack around that. Related to this, hoping the group can come to some idea/agreement on some semver style versioning for the project post 1.0. I think this could set a path forward for releases that can happen faster, while not holding up the whole train if a single smaller component has a couple issues that cant/wont be resolved by the main stakeholders or interested parties in said component. An example might be new pig or sqoop having issues.., the 1.2 release would still go out the door with 1.2.1 coming days/weeks later once new pig or sqoop was fixed up. - Proper package repository hosting - I put together a little test setup based on the 0.8 assets, we can
Re: Rebooting the conversation on the Future of bigtop: Abstracting the backplane ? Containers?
I personally like the idea of including mesos, docker stuff as options we provided to users. No doubt that these solutions can gain more eyeballs and attract new users/contributors to the bigtop family. As mentioned above, the use case seems still not clear yet, but there might be a chance that people will start to adopt them because of bigtop's support. Comming back to earth, to be honest I don't need thing like mesos, too. So including it would be a little bit difficaut since there's no real demand on it so far. Probably people who is intrested in the technology can start with an alpha/experimental feature and see if lots of poeple has intrest on it. The feature should be able to push in to our code base as long as there's a maintainer who committed to maintain it. I can pair program with Jay if you'd like to work on it. :) 2015年6月17日 上午3:02於 Andrew Purtell apurt...@apache.org寫道: thanks andy - i agree with most of your opinions around continuing to build standard packages.. but can you clarify what was offensive ? must be a misinterpretation somewhere. Sure. A bit offensive. gridgain or spark can do what 90% of the hadoop ecosystem already does, supporting streams, batch,sql all in one - This statement deprecates the utility of the labors of rest of the Hadoop ecosystem in favor of Gridgain and Spark. As a gross generalization it's unlikely to be a helpful statement in any case. It's fine if we all have our favorites, of course. I think we're set up well to empirically determine winners and losers, we don't need to make partisan statements. Those components that get some user interest in the form of contributions that keep them building and happy in Bigtop will stay in. Those that do not get the necessary attention will have to be culled out over time when and if they fail to compile or pass integration tests. On Mon, Jun 15, 2015 at 11:42 AM, jay vyas jayunit100.apa...@gmail.com wrote: thanks andy - i agree with most of your opinions around continuing to build standard packages.. but can you clarify what was offensive ? must be a misinterpretation somewhere. 1) To be clear, i am 100% behind supporting standard hadoop build rpms that we have now. Thats the core product and will be for the forseeable future, absolutely ! 2) The idea (and its just an idea i want to throw out - to keep us on our toes), is that some folks may be interested in hacking around, in a separate branch - on some bleeding edge bigdata deployments - which attempts to incorporate resource managers and containers as first-class citizens. Again this is all just ideas - not in any way meant to derail the packaging efforts - but rather - just to gauge folks interest level in the bleeding edge, docker, mesos, simplified processing stacks, and so on. On Mon, Jun 15, 2015 at 12:39 PM, Andrew Purtell apurt...@apache.org wrote: gridgain or spark can do what 90% of the hadoop ecosystem already does, supporting streams, batch,sql all in one) If something like this becomes the official position of the Bigtop project, some day, then it will turn off people. I can see where you are coming from, I think. Correct me if I'm wrong: We have limited bandwidth, we should move away from Roman et. al.'s vision of Bigtop as an inclusive distribution of big data packages, and instead become highly opinionated and tightly focused. If that's accurate, I can sum up my concern as follows: To the degree we become more opinionated, the less we may have to look at in terms of inclusion - both software and user communities. For example, I find the above quoted statement a bit offensive as a participant on not-Spark and not-Gridgain projects. I roll my eyes sometimes at the Docker over-hype. Is there still a place for me here? On Mon, Jun 15, 2015 at 9:22 AM, jay vyas jayunit100.apa...@gmail.com wrote: Hi folks. Every few months, i try to reboot the conversation about the next generation of bigtop. There are 3 things which i think we should consider : A backplane (rather than deploy to machines, the meaning of the term ecosystem in a post-spark in-memory apacolypse, and containerization. 1) BACKPLANE: The new trend is to have a backplane that provides networking abstractions for you (mesos, kubernetes, yarn, and so on). Is it time for us to pick a resource manager? 2) ECOSYSTEM?: Nowadays folks don't necessarily need the whole hadoop ecosystem, and there is a huge shift to in-memory, monolithic stacks happening (i.e. gridgain or spark can do what 90% of the hadoop ecosystem already does, supporting streams, batch,sql all in one). 3) CONTAINERS: we are doing a great job w/ docker in our build infra. Is it time to start experimenting with running docker tarballs ? Combining 1+2+3 - i could see a useful bigdata upstream distro which (1) just installed an HCFS implementation (gluster,HDFS,...) along side, say,
Re: Rebooting the conversation on the Future of bigtop: Abstracting the backplane ? Containers?
Your right Bruno , I could, but I have no need of such a thing:) And in any case --- this thread is just about sharing ideas, letting the whole community speak up about their opinions on the future of bigtop. it's not about driving a particular project direction. Bigtop is a unique project in that we integrate a lot of tools in a rapidly changing landscape, so it's good to have some feelers out there to see what our users are thinking. Thanks all for the feedback, hope to get more! On Jun 16, 2015, at 2:11 AM, Bruno Mahé bm...@apache.org wrote: On 06/15/2015 09:22 AM, jay vyas wrote: Hi folks. Every few months, i try to reboot the conversation about the next generation of bigtop. There are 3 things which i think we should consider : A backplane (rather than deploy to machines, the meaning of the term ecosystem in a post-spark in-memory apacolypse, and containerization. 1) BACKPLANE: The new trend is to have a backplane that provides networking abstractions for you (mesos, kubernetes, yarn, and so on). Is it time for us to pick a resource manager? 2) ECOSYSTEM?: Nowadays folks don't necessarily need the whole hadoop ecosystem, and there is a huge shift to in-memory, monolithic stacks happening (i.e. gridgain or spark can do what 90% of the hadoop ecosystem already does, supporting streams, batch,sql all in one). 3) CONTAINERS: we are doing a great job w/ docker in our build infra. Is it time to start experimenting with running docker tarballs ? Combining 1+2+3 - i could see a useful bigdata upstream distro which (1) just installed an HCFS implementation (gluster,HDFS,...) along side, say, (2) mesos as a backplane for the tooling for [[ hbase + spark + ignite ]] --- and then (3) do the integration testing of available mesos-framework plugins for ignite and spark underneath. If other folks are interested, maybe we could create the 1x or in-memory branch to start hacking on it sometime ?Maybe even bring the flink guys in as well, as they are interested in bigtop packaging. -- jay vyas I have roughly the same position as Andrew on that matter. What prevents you from starting something yourself to start hacking on it? Thanks, Bruno
Re: Rebooting the conversation on the Future of bigtop: Abstracting the backplane ? Containers?
thanks andy - i agree with most of your opinions around continuing to build standard packages.. but can you clarify what was offensive ? must be a misinterpretation somewhere. Sure. A bit offensive. gridgain or spark can do what 90% of the hadoop ecosystem already does, supporting streams, batch,sql all in one - This statement deprecates the utility of the labors of rest of the Hadoop ecosystem in favor of Gridgain and Spark. As a gross generalization it's unlikely to be a helpful statement in any case. It's fine if we all have our favorites, of course. I think we're set up well to empirically determine winners and losers, we don't need to make partisan statements. Those components that get some user interest in the form of contributions that keep them building and happy in Bigtop will stay in. Those that do not get the necessary attention will have to be culled out over time when and if they fail to compile or pass integration tests. On Mon, Jun 15, 2015 at 11:42 AM, jay vyas jayunit100.apa...@gmail.com wrote: thanks andy - i agree with most of your opinions around continuing to build standard packages.. but can you clarify what was offensive ? must be a misinterpretation somewhere. 1) To be clear, i am 100% behind supporting standard hadoop build rpms that we have now. Thats the core product and will be for the forseeable future, absolutely ! 2) The idea (and its just an idea i want to throw out - to keep us on our toes), is that some folks may be interested in hacking around, in a separate branch - on some bleeding edge bigdata deployments - which attempts to incorporate resource managers and containers as first-class citizens. Again this is all just ideas - not in any way meant to derail the packaging efforts - but rather - just to gauge folks interest level in the bleeding edge, docker, mesos, simplified processing stacks, and so on. On Mon, Jun 15, 2015 at 12:39 PM, Andrew Purtell apurt...@apache.org wrote: gridgain or spark can do what 90% of the hadoop ecosystem already does, supporting streams, batch,sql all in one) If something like this becomes the official position of the Bigtop project, some day, then it will turn off people. I can see where you are coming from, I think. Correct me if I'm wrong: We have limited bandwidth, we should move away from Roman et. al.'s vision of Bigtop as an inclusive distribution of big data packages, and instead become highly opinionated and tightly focused. If that's accurate, I can sum up my concern as follows: To the degree we become more opinionated, the less we may have to look at in terms of inclusion - both software and user communities. For example, I find the above quoted statement a bit offensive as a participant on not-Spark and not-Gridgain projects. I roll my eyes sometimes at the Docker over-hype. Is there still a place for me here? On Mon, Jun 15, 2015 at 9:22 AM, jay vyas jayunit100.apa...@gmail.com wrote: Hi folks. Every few months, i try to reboot the conversation about the next generation of bigtop. There are 3 things which i think we should consider : A backplane (rather than deploy to machines, the meaning of the term ecosystem in a post-spark in-memory apacolypse, and containerization. 1) BACKPLANE: The new trend is to have a backplane that provides networking abstractions for you (mesos, kubernetes, yarn, and so on). Is it time for us to pick a resource manager? 2) ECOSYSTEM?: Nowadays folks don't necessarily need the whole hadoop ecosystem, and there is a huge shift to in-memory, monolithic stacks happening (i.e. gridgain or spark can do what 90% of the hadoop ecosystem already does, supporting streams, batch,sql all in one). 3) CONTAINERS: we are doing a great job w/ docker in our build infra. Is it time to start experimenting with running docker tarballs ? Combining 1+2+3 - i could see a useful bigdata upstream distro which (1) just installed an HCFS implementation (gluster,HDFS,...) along side, say, (2) mesos as a backplane for the tooling for [[ hbase + spark + ignite ]] --- and then (3) do the integration testing of available mesos-framework plugins for ignite and spark underneath. If other folks are interested, maybe we could create the 1x or in-memory branch to start hacking on it sometime ?Maybe even bring the flink guys in as well, as they are interested in bigtop packaging. -- jay vyas -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) -- jay vyas -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
Re: Rebooting the conversation on the Future of bigtop: Abstracting the backplane ? Containers?
On 06/15/2015 09:22 AM, jay vyas wrote: Hi folks. Every few months, i try to reboot the conversation about the next generation of bigtop. There are 3 things which i think we should consider : A backplane (rather than deploy to machines, the meaning of the term ecosystem in a post-spark in-memory apacolypse, and containerization. 1) BACKPLANE: The new trend is to have a backplane that provides networking abstractions for you (mesos, kubernetes, yarn, and so on). Is it time for us to pick a resource manager? 2) ECOSYSTEM?: Nowadays folks don't necessarily need the whole hadoop ecosystem, and there is a huge shift to in-memory, monolithic stacks happening (i.e. gridgain or spark can do what 90% of the hadoop ecosystem already does, supporting streams, batch,sql all in one). 3) CONTAINERS: we are doing a great job w/ docker in our build infra. Is it time to start experimenting with running docker tarballs ? Combining 1+2+3 - i could see a useful bigdata upstream distro which (1) just installed an HCFS implementation (gluster,HDFS,...) along side, say, (2) mesos as a backplane for the tooling for [[ hbase + spark + ignite ]] --- and then (3) do the integration testing of available mesos-framework plugins for ignite and spark underneath. If other folks are interested, maybe we could create the 1x or in-memory branch to start hacking on it sometime ?Maybe even bring the flink guys in as well, as they are interested in bigtop packaging. -- jay vyas I have roughly the same position as Andrew on that matter. What prevents you from starting something yourself to start hacking on it? Thanks, Bruno
Re: Rebooting the conversation on the Future of bigtop: Abstracting the backplane ? Containers?
thanks andy - i agree with most of your opinions around continuing to build standard packages.. but can you clarify what was offensive ? must be a misinterpretation somewhere. 1) To be clear, i am 100% behind supporting standard hadoop build rpms that we have now. Thats the core product and will be for the forseeable future, absolutely ! 2) The idea (and its just an idea i want to throw out - to keep us on our toes), is that some folks may be interested in hacking around, in a separate branch - on some bleeding edge bigdata deployments - which attempts to incorporate resource managers and containers as first-class citizens. Again this is all just ideas - not in any way meant to derail the packaging efforts - but rather - just to gauge folks interest level in the bleeding edge, docker, mesos, simplified processing stacks, and so on. On Mon, Jun 15, 2015 at 12:39 PM, Andrew Purtell apurt...@apache.org wrote: gridgain or spark can do what 90% of the hadoop ecosystem already does, supporting streams, batch,sql all in one) If something like this becomes the official position of the Bigtop project, some day, then it will turn off people. I can see where you are coming from, I think. Correct me if I'm wrong: We have limited bandwidth, we should move away from Roman et. al.'s vision of Bigtop as an inclusive distribution of big data packages, and instead become highly opinionated and tightly focused. If that's accurate, I can sum up my concern as follows: To the degree we become more opinionated, the less we may have to look at in terms of inclusion - both software and user communities. For example, I find the above quoted statement a bit offensive as a participant on not-Spark and not-Gridgain projects. I roll my eyes sometimes at the Docker over-hype. Is there still a place for me here? On Mon, Jun 15, 2015 at 9:22 AM, jay vyas jayunit100.apa...@gmail.com wrote: Hi folks. Every few months, i try to reboot the conversation about the next generation of bigtop. There are 3 things which i think we should consider : A backplane (rather than deploy to machines, the meaning of the term ecosystem in a post-spark in-memory apacolypse, and containerization. 1) BACKPLANE: The new trend is to have a backplane that provides networking abstractions for you (mesos, kubernetes, yarn, and so on). Is it time for us to pick a resource manager? 2) ECOSYSTEM?: Nowadays folks don't necessarily need the whole hadoop ecosystem, and there is a huge shift to in-memory, monolithic stacks happening (i.e. gridgain or spark can do what 90% of the hadoop ecosystem already does, supporting streams, batch,sql all in one). 3) CONTAINERS: we are doing a great job w/ docker in our build infra. Is it time to start experimenting with running docker tarballs ? Combining 1+2+3 - i could see a useful bigdata upstream distro which (1) just installed an HCFS implementation (gluster,HDFS,...) along side, say, (2) mesos as a backplane for the tooling for [[ hbase + spark + ignite ]] --- and then (3) do the integration testing of available mesos-framework plugins for ignite and spark underneath. If other folks are interested, maybe we could create the 1x or in-memory branch to start hacking on it sometime ?Maybe even bring the flink guys in as well, as they are interested in bigtop packaging. -- jay vyas -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) -- jay vyas
Re: Rebooting the conversation on the Future of bigtop: Abstracting the backplane ? Containers?
Is it time for us to pick a resource manager? Not if we want to be like a Debian for big data software. I'm not sure we want to limit our reach by being overly opinionated. With my user's hat on, if we don't package Hadoop and YARN, then I wouldn't have any use for Bigtop. Nowadays folks don't necessarily need the whole hadoop ecosystem, and there is a huge shift to in-memory, monolithic stacks happening A Bigtop user would only need to install the packages they would like to use, right? Is this an argument for exclusion? Exclusion of what? Is it time to start experimenting with running docker tarballs ? This sounds fine as an additional target for builds, but not if it leads to a proposal to do away with the OS native packaging. That's useful too. Containers are trendy but not useful or even appropriate for every environment or use case. On Mon, Jun 15, 2015 at 9:22 AM, jay vyas jayunit100.apa...@gmail.com wrote: Hi folks. Every few months, i try to reboot the conversation about the next generation of bigtop. There are 3 things which i think we should consider : A backplane (rather than deploy to machines, the meaning of the term ecosystem in a post-spark in-memory apacolypse, and containerization. 1) BACKPLANE: The new trend is to have a backplane that provides networking abstractions for you (mesos, kubernetes, yarn, and so on). Is it time for us to pick a resource manager? 2) ECOSYSTEM?: Nowadays folks don't necessarily need the whole hadoop ecosystem, and there is a huge shift to in-memory, monolithic stacks happening (i.e. gridgain or spark can do what 90% of the hadoop ecosystem already does, supporting streams, batch,sql all in one). 3) CONTAINERS: we are doing a great job w/ docker in our build infra. Is it time to start experimenting with running docker tarballs ? Combining 1+2+3 - i could see a useful bigdata upstream distro which (1) just installed an HCFS implementation (gluster,HDFS,...) along side, say, (2) mesos as a backplane for the tooling for [[ hbase + spark + ignite ]] --- and then (3) do the integration testing of available mesos-framework plugins for ignite and spark underneath. If other folks are interested, maybe we could create the 1x or in-memory branch to start hacking on it sometime ?Maybe even bring the flink guys in as well, as they are interested in bigtop packaging. -- jay vyas -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
Re: Rebooting the conversation on the Future of bigtop: Abstracting the backplane ? Containers?
On Mon, Jun 15, 2015 at 9:22 AM, jay vyas jayunit100.apa...@gmail.com wrote: Hi folks. Every few months, i try to reboot the conversation about the next generation of bigtop. There are 3 things which i think we should consider : A backplane (rather than deploy to machines, the meaning of the term ecosystem in a post-spark in-memory apacolypse, and containerization. 1) BACKPLANE: The new trend is to have a backplane that provides networking abstractions for you (mesos, kubernetes, yarn, and so on). Is it time for us to pick a resource manager? Let me rephrase the above and see if we're talking about the same thing. To me your question is really about what does a datacenter look like to Bigtop. Today a datacenter looks to Bigtop as a bunch of individual nodes running some kind of a Linux distribution. What you seem to be asking is that whether it is time for us to embrace the vision of a datacenter that looks like mesos, etc. Correct? Also, I don't think you're suggesting that we drop the bread-n-butter of Bigtop, but I still need to make sure. 2) ECOSYSTEM?: Nowadays folks don't necessarily need the whole hadoop ecosystem, and there is a huge shift to in-memory, monolithic stacks happening (i.e. gridgain or spark can do what 90% of the hadoop ecosystem already does, supporting streams, batch,sql all in one). Correct. That said, I'm not sure what it means for Bigtop. 3) CONTAINERS: we are doing a great job w/ docker in our build infra. Is it time to start experimenting with running docker tarballs ? I think it is time, but Combining 1+2+3 - i could see a useful bigdata upstream distro which (1) just installed an HCFS implementation (gluster,HDFS,...) along side, say, (2) mesos as a backplane for the tooling for [[ hbase + spark + ignite ]] --- and then (3) do the integration testing of available mesos-framework plugins for ignite and spark underneath. If other folks are interested, maybe we could create the 1x or in-memory branch to start hacking on it sometime ?Maybe even bring the flink guys in as well, as they are interested in bigtop packaging. I'm actually very curious about use cases that folks might have around traditional Hadoop Distributions. What you're articulating above seems like one of those use cases, but at this point I'm sort of lost as to what's the most common use case. Thanks, Roman.