Re: [DISCUSSION] Merging HDFS-7240 Object Store (Ozone) to trunk

Stack Fri, 03 Nov 2017 12:09:19 -0700

On Sat, Oct 28, 2017 at 2:00 PM, Konstantin Shvachko <[email protected]>
wrote:


> Hey guys,
>
> It is an interesting question whether Ozone should be a part of Hadoop.
>


I don't see a direct answer to this question. Is there one? Pardon me if
I've not seen it but I'm interested in the response.

I ask because IMO the "Hadoop" project is over-stuffed already. Just see
the length of the cc list on this email. Ozone could be standalone. It is a
coherent enough effort.

Thanks,
St.Ack





> There are two main reasons why I think it should not.
>
1. With close to 500 sub-tasks, with 6 MB of code changes, and with a
> sizable community behind, it looks to me like a whole new project.
> It is essentially a new storage system, with different (than HDFS)
> architecture, separate S3-like APIs. This is really great - the World sure
> needs more distributed file systems. But it is not clear why Ozone should
> co-exist with HDFS under the same roof.
>
> 2. Ozone is probably just the first step in rebuilding HDFS under a new
> architecture. With the next steps presumably being HDFS-10419 and
> HDFS-11118.
> The design doc for the new architecture has never been published. I can
> only assume based on some presentations and personal communications that
> the idea is to use Ozone as a block storage, and re-implement NameNode, so
> that it stores only a partial namesapce in memory, while the bulk of it
> (cold data) is persisted to a local storage.
> Such architecture makes me wonder if it solves Hadoop's main problems.
> There are two main limitations in HDFS:
>   a. The throughput of Namespace operations. Which is limited by the number
> of RPCs the NameNode can handle
>   b. The number of objects (files + blocks) the system can maintain. Which
> is limited by the memory size of the NameNode.
> The RPC performance (a) is more important for Hadoop scalability than the
> object count (b). The read RPCs being the main priority.
> The new architecture targets the object count problem, but in the expense
> of the RPC throughput. Which seems to be a wrong resolution of the
> tradeoff.
> Also based on the use patterns on our large clusters we read up to 90% of
> the data we write, so cold data is a small fraction and most of it must be
> cached.
>
> To summarize:
> - Ozone is a big enough system to deserve its own project.
> - The architecture that Ozone leads to does not seem to solve the intrinsic
> problems of current HDFS.
>
> I will post my opinion in the Ozone jira. Should be more convenient to
> discuss it there for further reference.
>
> Thanks,
> --Konstantin
>
>
>
> On Wed, Oct 18, 2017 at 6:54 PM, Yang Weiwei <[email protected]>
> wrote:
>
> > Hello everyone,
> >
> >
> > I would like to start this thread to discuss merging Ozone (HDFS-7240) to
> > trunk. This feature implements an object store which can co-exist with
> > HDFS. Ozone is disabled by default. We have tested Ozone with cluster
> sizes
> > varying from 1 to 100 data nodes.
> >
> >
> >
> > The merge payload includes the following:
> >
> >   1.  All services, management scripts
> >   2.  Object store APIs, exposed via both REST and RPC
> >   3.  Master service UIs, command line interfaces
> >   4.  Pluggable pipeline Integration
> >   5.  Ozone File System (Hadoop compatible file system implementation,
> > passes all FileSystem contract tests)
> >   6.  Corona - a load generator for Ozone.
> >   7.  Essential documentation added to Hadoop site.
> >   8.  Version specific Ozone Documentation, accessible via service UI.
> >   9.  Docker support for ozone, which enables faster development cycles.
> >
> >
> > To build Ozone and run ozone using docker, please follow instructions in
> > this wiki page. https://cwiki.apache.org/confl
> > uence/display/HADOOP/Dev+cluster+with+docker.
> >
> >
> > We have built a passionate and diverse community to drive this feature
> > development. As a team, we have achieved significant progress in past 3
> > years since first JIRA for HDFS-7240 was opened on Oct 2014. So far, we
> > have resolved almost 400 JIRAs by 20+ contributors/committers from
> > different countries and affiliations. We also want to thank the large
> > number of community members who were supportive of our efforts and
> > contributed ideas and participated in the design of ozone.
> >
> >
> > Please share your thoughts, thanks!
> >
> >
> > -- Weiwei Yang
> >
>
>
> On Wed, Oct 18, 2017 at 6:54 PM, Yang Weiwei <[email protected]>
> wrote:
>
> > Hello everyone,
> >
> >
> > I would like to start this thread to discuss merging Ozone (HDFS-7240) to
> > trunk. This feature implements an object store which can co-exist with
> > HDFS. Ozone is disabled by default. We have tested Ozone with cluster
> sizes
> > varying from 1 to 100 data nodes.
> >
> >
> >
> > The merge payload includes the following:
> >
> >   1.  All services, management scripts
> >   2.  Object store APIs, exposed via both REST and RPC
> >   3.  Master service UIs, command line interfaces
> >   4.  Pluggable pipeline Integration
> >   5.  Ozone File System (Hadoop compatible file system implementation,
> > passes all FileSystem contract tests)
> >   6.  Corona - a load generator for Ozone.
> >   7.  Essential documentation added to Hadoop site.
> >   8.  Version specific Ozone Documentation, accessible via service UI.
> >   9.  Docker support for ozone, which enables faster development cycles.
> >
> >
> > To build Ozone and run ozone using docker, please follow instructions in
> > this wiki page. https://cwiki.apache.org/confluence/display/HADOOP/Dev+
> > cluster+with+docker.
> >
> >
> > We have built a passionate and diverse community to drive this feature
> > development. As a team, we have achieved significant progress in past 3
> > years since first JIRA for HDFS-7240 was opened on Oct 2014. So far, we
> > have resolved almost 400 JIRAs by 20+ contributors/committers from
> > different countries and affiliations. We also want to thank the large
> > number of community members who were supportive of our efforts and
> > contributed ideas and participated in the design of ozone.
> >
> >
> > Please share your thoughts, thanks!
> >
> >
> > -- Weiwei Yang
> >
>

Re: [DISCUSSION] Merging HDFS-7240 Object Store (Ozone) to trunk

Reply via email to