Hi folks. Every few months, i try to reboot the conversation about the next generation of bigtop.
There are 3 things which i think we should consider : A backplane (rather than deploy to machines, the meaning of the term "ecosystem" in a post-spark in-memory apacolypse, and containerization. 1) BACKPLANE: The new trend is to have a backplane that provides networking abstractions for you (mesos, kubernetes, yarn, and so on). Is it time for us to pick a resource manager? 2) ECOSYSTEM?: Nowadays folks don't necessarily need the whole hadoop ecosystem, and there is a huge shift to in-memory, monolithic stacks happening (i.e. gridgain or spark can do what 90% of the hadoop ecosystem already does, supporting streams, batch,sql all in one). 3) CONTAINERS: we are doing a great job w/ docker in our build infra. Is it time to start experimenting with running docker tarballs ? Combining 1+2+3 - i could see a useful bigdata upstream distro which (1) just installed an HCFS implementation (gluster,HDFS,...) along side, say, (2) mesos as a backplane for the tooling for [[ hbase + spark + ignite ]] --- and then (3) do the integration testing of available mesos-framework plugins for ignite and spark underneath. If other folks are interested, maybe we could create the "1x" or "in-memory" branch to start hacking on it sometime ? Maybe even bring the flink guys in as well, as they are interested in bigtop packaging. -- jay vyas
