+1 (non binding)
On 2018/07/05 19:22:19, Dave Fisher <d...@comcast.net> wrote: > Hi All,> > > I would like to start a VOTE to bring the Doris project as an Apache > incubator podling.> > > The ASF voting rules are described:> > > https://www.apache.org/foundation/voting.html > <https://www.apache.org/foundation/voting.html>> > > A vote for accepting a new Apache Incubator podling is a majority vote for > which only Incubator PMC member votes are binding.> > > This vote will run for at least 72 hours. Please VOTE as follows> > [] +1 Accept Doris into the Apache Incubator> > [] +0 Abstain.> > [] -1 Do not accept Doris into the Apache Incubator because ...> > > The proposal is listed below, but you can also access it on the wiki:> > > https://wiki.apache.org/incubator/DorisProposal> > > Best regards,> > Dave> > > = Apache Doris => > > == Abstract ==> > > Doris is a MPP-based interactive SQL data warehousing for reporting and > analysis.> > > == Proposal ==> > > We propose to contribute the Doris codebase and associated artifacts (e.g. > documentation, web-site content etc.) to the Apache Software Foundation, and > aim to build an open community around Doris’s continued development in the > ‘Apache Way’.> > > === Overview of Doris ===> > > Doris’s implementation consists of two daemons: Frontend (FE) and Backend > (BE).> > > **Frontend daemon** consists of query coordinator and catalog manager. Query > coordinator is responsible for receiving users’ sql queries, compiling > queries and managing queries execution. Catalog manager is responsible for > managing metadata such as databases, tables, partitions, replicas and etc. > Several frontend daemons could be deployed to guarantee fault-tolerance, and > load balancing.> > > **Backend daemon** stores the data and executes the query fragments. Many > backend daemons could also be deployed to provide scalability and > fault-tolerance.> > > A typical Doris cluster generally composes of several frontend daemons and > dozens to hundreds of backend daemons.> > > Users can use MySQL client tools to connect any frontend daemon to submit SQL > query. Frontend receives the query and compiles it into query plans > executable by the Backend. Then Frontend sends the query plan fragments to > Backend. Backend will build a query execution DAG. Data is fetched and > pipelined into the DAG. The final result response is sent to client via > Frontend. The distribution of query fragment execution takes minimizing data > movement and maximizing scan locality as the main goal.> > > == Background ==> > > At Baidu, Prior to Doris, different tools were deployed to solve diverse > requirements in many ways. And when a use case requires the simultaneous > availability of capabilities that cannot all be provided by a single tool, > users were forced to build hybrid architectures that stitch multiple tools > together, but we believe that they shouldn’t need to accept such inherent > complexity. A storage system built to provide great performance across a > broad range of workloads provides a more elegant solution to the problems > that hybrid architectures aim to solve. Doris is the solution.> > > Doris is designed to be a simple and single tightly coupled system, not > depending on other systems. Doris provides high concurrent low latency point > query performance, but also provides high throughput queries of ad-hoc > analysis. Doris provides bulk-batch data loading, but also provides near > real-time mini-batch data loading. Doris also provides high availability, > reliability, fault tolerance, and scalability.> > > == Rationale ==> > > Doris mainly integrates the technology of Google Mesa and Apache Impala.> > > Mesa is a highly scalable analytic data storage system that stores critical > measurement data related to Google's Internet advertising business. Mesa is > designed to satisfy complex and challenging set of users’ and systems’ > requirements, including near real-time data ingestion and query ability, as > well as high availability, reliability, fault tolerance, and scalability for > large data and query volumes.> > > Impala is a modern, open-source MPP SQL engine architected from the ground up > for the Hadoop data processing environment. At present, by virtue of its > superior performance and rich functionality, Impala has been comparable to > many commercial MPP database query engine. Mesa can satisfy the needs of many > of our storage requirements, however Mesa itself does not provide a SQL query > engine; Impala is a very good MPP SQL query engine, but the lack of a perfect > distributed storage engine. So in the end we chose the combination of these > two technologies.> > > Learning from Mesa’s data model, we developed a distributed storage engine. > Unlike Mesa, this storage engine does not rely on any distributed file > system. Then we deeply integrate this storage engine with Impala query > engine. Query compiling, query execution coordination and catalog management > of storage engine are integrated to be frontend daemon; query execution and > data storage are integrated to be backend daemon. With this integration, we > implemented a single, full-featured, high performance state the art of MPP > database, as well as maintaining the simplicity.> > > == Current Status ==> > > Doris has been an open source project on GitHub > (https://github.com/baidu/palo).> > > === Meritocracy ===> > > Doris has been deployed in production at Baidu and is applying more than 200 > lines of business. It has demonstrated great performance benefits and has > proved to be a better way for reporting and analysis based big data. Still We > look forward to growing a rich user and developer community.> > > === Community ===> > > Doris seeks to develop developer and user communities during incubation.> > > Doris makes use of Apache Impala. It was identified during early review of > the proposal that the Doris community will need to work with Impala to define > a suitable API.> > > === Core Developers ===> > > * Ruyue Ma (https://github.com/maruyue, maruyue@baidu dot com)> > * Chun Zhao (https://github.com/imay, buaa.zhaoc@gmail dot com)> > * Mingyu Chen (https://github.com/morningman,chenmingyu@baidu dot com)> > * De Li(https://github.com/lide-reed, mailtolide@sina dot com)> > * Hao Chen (https://github.com/chenhao7253886, chenhao16@baidu dot com)> > * Chaoyong Li (https://github.com/cyongli, lichaoyong@baidu dot com)> > * Bin Lin (https://github.com/lingbin, lingbinlb@gmail dot com)> > > === Alignment ===> > > Doris is related to several other Apache projects:> > > * Doris can also read data stored in Apache Hadoop clusters powered by the > HDFS filesystem.> > * Doris is closely integrated with Impala, which has graduated from Apache > Incubator.> > * Doris uses Apache Thrift as its RPC and serialization framework of > choice.> > > == Known Risks ==> > > === Orphaned Products ===> > > The core developers of Doris team plan to work full time on this project. > There is very little risk of Doris getting orphaned since at least one large > company (Baidu) is extensively using it in their production. For example, > currently there are more than 200 use cases using Doris in production. > Furthermore, since Doris was open sourced at the beginning of October 2017, > it has received more than 660 stars and been forked nearly 170 times. We plan > to extend and diversify this community further through Apache.> > > === Inexperience with Open Source ===> > > The core developers are all active users and followers of open source. They > are already committers and contributors to the Doris Github project. All have > been involved with the source code that has been released under an open > source license, and several of them also have experience developing code in > an open source environment. Though the core set of Developers do not have > Apache Open Source experience, there are plans to onboard individuals with > Apache open source experience on to the project.> > > === Homogenous Developers ===> > > The most of core developers are from Baidu, but after Doris was open sourced, > Doris received a lot of bug fixes and enhancements from other developers not > working at Baidu.> > > === Reliance on Salaried Developers ===> > > Baidu invested in Doris as the OLAP solution and some of its key engineers > are working full time on the project. In addition, since there is a growing > Big Data need for scalable OLAP solutions, we look forward to other Apache > developers and researchers to contribute to the project. Also key to > addressing the risk associated with relying on Salaried developers from a > single entity is to increase the diversity of the contributors and actively > lobby for Domain experts in the BI space to contribute. Apache Doris intends > to do this.> > > === An Excessive Fascination with the Apache Brand ===> > > Doris is proposing to enter incubation at Apache in order to help efforts to > diversify the committer-base, not so much to capitalize on the Apache brand. > The Doris project is in production use already inside Baidu, but is not > expected to be an Baidu product for external customers. As such, the Doris > project is not seeking to use the Apache brand as a marketing tool.> > > == Documentation ==> > > Information about Doris can be found at https://github.com/baidu/palo. The > following links provide more information about Doris in open source:> > > * Doris wiki site: https://github.com/baidu/palo/wiki> > * Codebase at Github: https://github.com/baidu/palo> > * Issue Tracking: https://github.com/baidu/palo/issues> > * Overview: https://github.com/baidu/Doris/wiki/palo-Overview> > * FAQ: https://github.com/baidu/palo/wiki/palo-FAQ> > > == Initial Source ==> > > Doris has been under development since 2017 by a team of engineers at Baidu > Inc. It is currently hosted on Github.com under an Apache license at > https://github.com/baidu/palo.> > > == External Dependencies ==> > > Doris has the following external dependencies.> > > * Google gflags (BSD)> > * Google glog (BSD)> > * Apache Thrift (Apache Software License v2.0)> > * Apache Commons (Apache Software License v2.0)> > * Boost (Boost Software License)> > * rapidjson (Tencent)> > * Google RE2 (BSD-style)> > * lz4 (BSD)> > * snappy (BSD)> > * Twitter Bootstrap (Apache Software License v2.0)> > * d3 (BSD)> > * LLVM (BSD-like)> > > Build and test dependencies:> > > * Apache Ant (Apache Software License v2.0)> > * Apache Maven (Apache Software License v2.0)> > * cmake (BSD)> > * clang (BSD)> > * Google gtest (Apache Software License v2.0)> > > == Required Resources ==> > > === Mailing List ===> > > There are currently no mailing lists. The usual mailing lists are expected to > be set up when entering incubation:> > > * priv...@doris.incubator.apache.org> > * d...@doris.incubator.apache.org> > * comm...@doris.incubator.apache.org> > > === Subversion Directory ===> > > Upon entering incubation, we want to move (or copy) the existing repo from > https://github.com/baidu/palo to Apache infrastructure at > https://github.com/apache/incubator-doris.> > > === Issue Tracking ===> > > Doris currently uses GitHub to track issues. Would like to continue to do so > while we discuss migration possibilities with the ASF Infra committee.> > > === Other Resources ===> > > The existing code already has unit tests so we will make use of existing > Apache continuous testing infrastructure. The resulting load should not be > very large.> > > == Initial Committers ==> > > * Ruyue Ma (https://github.com/maruyue, maruyue@baidu dot com)> > * Chun Zhao (https://github.com/imay, buaa.zhaoc@gmail dot com)> > * Mingyu Chen (https://github.com/morningman,chenmingyu@baidu dot com)> > * De Li(https://github.com/lide-reed, mailtolide@sina dot com)> > * Hao Chen (https://github.com/chenhao7253886, chenhao16@baidu dot com)> > * Chaoyong Li (https://github.com/cyongli, lichaoyong@baidu dot com)> > * Bin Lin (https://github.com/lingbin, lingbinlb@gmail dot com)> > * Sijie Guo (guosijie@gmail dot com)> > * Zheng Shao (zs...@apache.org)> > > == Affiliations ==> > > The initial committers are employees of Baidu Inc..> > > == Sponsors ==> > > === Champion ===> > > * Dave Fisher, w...@apache.org> > > === Nominated Mentors ===> > > * Luke Han, luke...@apache.org> > * Dave Fisher, w...@apache.org> > * Willem Jiang, ningji...@apache.org> > > === Sponsoring Entity ===> > > We are requesting the Incubator to sponsor this project.> > >