On Sat, Oct 28, 2017 at 2:00 PM, Konstantin Shvachko <shv.had...@gmail.com> wrote:
> Hey guys, > > It is an interesting question whether Ozone should be a part of Hadoop. > I don't see a direct answer to this question. Is there one? Pardon me if I've not seen it but I'm interested in the response. I ask because IMO the "Hadoop" project is over-stuffed already. Just see the length of the cc list on this email. Ozone could be standalone. It is a coherent enough effort. Thanks, St.Ack > There are two main reasons why I think it should not. > 1. With close to 500 sub-tasks, with 6 MB of code changes, and with a > sizable community behind, it looks to me like a whole new project. > It is essentially a new storage system, with different (than HDFS) > architecture, separate S3-like APIs. This is really great - the World sure > needs more distributed file systems. But it is not clear why Ozone should > co-exist with HDFS under the same roof. > > 2. Ozone is probably just the first step in rebuilding HDFS under a new > architecture. With the next steps presumably being HDFS-10419 and > HDFS-11118. > The design doc for the new architecture has never been published. I can > only assume based on some presentations and personal communications that > the idea is to use Ozone as a block storage, and re-implement NameNode, so > that it stores only a partial namesapce in memory, while the bulk of it > (cold data) is persisted to a local storage. > Such architecture makes me wonder if it solves Hadoop's main problems. > There are two main limitations in HDFS: > a. The throughput of Namespace operations. Which is limited by the number > of RPCs the NameNode can handle > b. The number of objects (files + blocks) the system can maintain. Which > is limited by the memory size of the NameNode. > The RPC performance (a) is more important for Hadoop scalability than the > object count (b). The read RPCs being the main priority. > The new architecture targets the object count problem, but in the expense > of the RPC throughput. Which seems to be a wrong resolution of the > tradeoff. > Also based on the use patterns on our large clusters we read up to 90% of > the data we write, so cold data is a small fraction and most of it must be > cached. > > To summarize: > - Ozone is a big enough system to deserve its own project. > - The architecture that Ozone leads to does not seem to solve the intrinsic > problems of current HDFS. > > I will post my opinion in the Ozone jira. Should be more convenient to > discuss it there for further reference. > > Thanks, > --Konstantin > > > > On Wed, Oct 18, 2017 at 6:54 PM, Yang Weiwei <cheersy...@hotmail.com> > wrote: > > > Hello everyone, > > > > > > I would like to start this thread to discuss merging Ozone (HDFS-7240) to > > trunk. This feature implements an object store which can co-exist with > > HDFS. Ozone is disabled by default. We have tested Ozone with cluster > sizes > > varying from 1 to 100 data nodes. > > > > > > > > The merge payload includes the following: > > > > 1. All services, management scripts > > 2. Object store APIs, exposed via both REST and RPC > > 3. Master service UIs, command line interfaces > > 4. Pluggable pipeline Integration > > 5. Ozone File System (Hadoop compatible file system implementation, > > passes all FileSystem contract tests) > > 6. Corona - a load generator for Ozone. > > 7. Essential documentation added to Hadoop site. > > 8. Version specific Ozone Documentation, accessible via service UI. > > 9. Docker support for ozone, which enables faster development cycles. > > > > > > To build Ozone and run ozone using docker, please follow instructions in > > this wiki page. https://cwiki.apache.org/confl > > uence/display/HADOOP/Dev+cluster+with+docker. > > > > > > We have built a passionate and diverse community to drive this feature > > development. As a team, we have achieved significant progress in past 3 > > years since first JIRA for HDFS-7240 was opened on Oct 2014. So far, we > > have resolved almost 400 JIRAs by 20+ contributors/committers from > > different countries and affiliations. We also want to thank the large > > number of community members who were supportive of our efforts and > > contributed ideas and participated in the design of ozone. > > > > > > Please share your thoughts, thanks! > > > > > > -- Weiwei Yang > > > > > On Wed, Oct 18, 2017 at 6:54 PM, Yang Weiwei <cheersy...@hotmail.com> > wrote: > > > Hello everyone, > > > > > > I would like to start this thread to discuss merging Ozone (HDFS-7240) to > > trunk. This feature implements an object store which can co-exist with > > HDFS. Ozone is disabled by default. We have tested Ozone with cluster > sizes > > varying from 1 to 100 data nodes. > > > > > > > > The merge payload includes the following: > > > > 1. All services, management scripts > > 2. Object store APIs, exposed via both REST and RPC > > 3. Master service UIs, command line interfaces > > 4. Pluggable pipeline Integration > > 5. Ozone File System (Hadoop compatible file system implementation, > > passes all FileSystem contract tests) > > 6. Corona - a load generator for Ozone. > > 7. Essential documentation added to Hadoop site. > > 8. Version specific Ozone Documentation, accessible via service UI. > > 9. Docker support for ozone, which enables faster development cycles. > > > > > > To build Ozone and run ozone using docker, please follow instructions in > > this wiki page. https://cwiki.apache.org/confluence/display/HADOOP/Dev+ > > cluster+with+docker. > > > > > > We have built a passionate and diverse community to drive this feature > > development. As a team, we have achieved significant progress in past 3 > > years since first JIRA for HDFS-7240 was opened on Oct 2014. So far, we > > have resolved almost 400 JIRAs by 20+ contributors/committers from > > different countries and affiliations. We also want to thank the large > > number of community members who were supportive of our efforts and > > contributed ideas and participated in the design of ozone. > > > > > > Please share your thoughts, thanks! > > > > > > -- Weiwei Yang > > >