Hi Michel,

that is a valid point as well, at the moment I am not aware of anyone
looking into the short-circuit read feature or has any plans to do so, so
that is why I think it will take time to get to implementing it. If you
take this on then for sure the community will be grateful. ;)
On the other hand let's give it a day or two to see if anyone else wants to
comment on it :)

Pifta

Michel Sumbul <michelsum...@gmail.com> ezt írta (időpont: 2020. júl. 20.,
Hét 15:11):

> Hi Pifta,
>
> It's true that the technology is moving to a separation of compute/storage
> which is not a bad thing depending on the use case and the workload.
> However when it comes to processing PB of data, data locality becomes a key
> element and the short circuit is part of that for me.
> I definitely need to start to look at the code of ozone :)
>
> Thanks for all the answers!
> Michel
>
> Le lun. 20 juil. 2020 à 13:23, István Fajth <fapi...@gmail.com> a écrit :
>
> > Hi Michel,
> >
> > currently Ozone does not support short-circuit reads, it is on the
> roadmap
> > but as we transition to a segregated storage and compute word it is not
> the
> > most important one afaik.
> >
> > It is a FileSystem related thing and in Object Stores it is not a thing
> at
> > all, as we continue our FS interface developments it certainly becomes
> > important though but probably later than sooner.
> > On the other hand, as with any placement policy, or any other
> improvement,
> > you should feel free to add this functionality on your own of course and
> > the community will be happy to help and review :)
> >
> > Pifta
> >
> > Michel Sumbul <michelsum...@gmail.com> ezt írta (időpont: 2020. júl.
> 20.,
> > Hét 13:53):
> >
> > > Thanks Pifta that's really clear!
> > >
> > > If you don't mind a last question on data locality, does Ozone support
> > > short-circuit like HDFS?
> > > If not, is it something on the roadmap? Short circuit provide a
> > significant
> > > performance boost in the HDFS world, do you think it will be the same
> for
> > > Ozone?
> > >
> > > Thanks,
> > > Michel
> > >
> > > Le mar. 14 juil. 2020 à 22:02, István Fajth <fapi...@gmail.com> a
> écrit
> > :
> > >
> > > > Hi Michel,
> > > >
> > > > at the moment the placement policy is an interesting topic.
> > > > In Ozone placement is considered in terms of containers, and not
> > blocks.
> > > > Block are sub-container structures.
> > > > The container has a lifecycle, when it is open then the pipeline
> > attached
> > > > to it is defining the placement of data. The pipeline placement if
> > there
> > > > are racks and we are talking about replication factor 3 pipelines
> then
> > it
> > > > places two container replicas into one rack and one into an other
> rack.
> > > > This is a wired behaviour, and pipelines are balanced between
> > DataNodes.
> > > If
> > > > there are no racks defined, or just one rack is defined pipeline
> > > placement
> > > > falls back to random placement that considers space available on
> > > DataNodes
> > > > and favors nodes with more available space.
> > > >
> > > > When a container gets closed, the replicas are managed by the
> > > > ReplicationManager, which has a configurable policy. There are three
> > > > policies at the moment, random, available space aware random, and
> rack
> > > > aware policy.
> > > > The closed containers are moved by the ReplicationManager as needed
> if
> > > > replication violates the policy or replicas are created or removed
> when
> > > > under or overreplication occurs.
> > > >
> > > > This is because Ozone aims to balance the write I/O by balancing the
> > > > pipelines. Read I/O is balanced by the random placement within the
> > rules
> > > > defined by the policy.
> > > >
> > > > Ozone needs to harmonize the pipeline placement and the container
> > > placement
> > > > in the future as we want to add more policies for sure but at the
> > moment
> > > > this is how placement works.
> > > >
> > > > In regards of balancing at the moment we do not have a balancing
> logic
> > > > built in, and we do not have a balancer tool like HDFS at the moment
> it
> > > is
> > > > part of the roadmap, however you can bet any balancing logic has to
> > > > consider the placement policy configured for closed containers at
> > least.
> > > >
> > > > If you need to have a policy like the one you mentioned, the closed
> > > > container policy is pluggable, so you can write your own or even
> > > contribute
> > > > it to the project if you want.
> > > > But at the moment you need to consider the load which will be there
> if
> > > the
> > > > custom policy is violated by the pipeline placement then at container
> > > > closure containers have to be moved to fit with the closed container
> > > > placement policy.
> > > >
> > > > Pifta
> > > >
> > > > Michel Sumbul <michelsum...@gmail.com> ezt írta (időpont: 2020. júl.
> > > 14.,
> > > > Ke 15:38):
> > > >
> > > > > Hi Pifta,
> > > > >
> > > > > Thanks for your reply.
> > > > > That's good news! Does Ozone also support other placement policies
> > like
> > > > one
> > > > > replica in 3 different racks? That will be super useful from an
> > > > operational
> > > > > point of view. It will be possible to put in maintenance (for
> update
> > or
> > > > > other task) an entire rack and be sure that 2 other replicas are
> in 2
> > > > > different racks still up and running and not losing 2 replicas.
> > > > >
> > > > > Does the placement policy is also enforced during the rebalancing
> > like
> > > in
> > > > > HDFS?
> > > > >
> > > > > Thanks,
> > > > > Michel
> > > > >
> > > > > Le jeu. 9 juil. 2020 à 13:05, István Fajth <fapi...@gmail.com> a
> > > écrit :
> > > > >
> > > > > > Hi Michel,
> > > > > >
> > > > > > yes, Ozone has topology support (currently 3 levels are
> supported:
> > > > root,
> > > > > > rack, node) to specify cluster topology similarly as in HDFS.
> With
> > > > > > replication factor 3 it works similarly as in HDFS and ensures
> that
> > > > > > container replicas reside in 2 racks, 2 in one rack, and 1 in
> > another
> > > > > rack.
> > > > > > Also the FileSystem APIs (o3fs:// and ofs://) are implementing
> the
> > > > > methods
> > > > > > required to provide the locality information to the clients
> > similarly
> > > > as
> > > > > in
> > > > > > HDFS, so YARN can take advantage of this information, and can
> bring
> > > > > compute
> > > > > > to the data as with HDFS.
> > > > > >
> > > > > > It is worth noting that there are not too many clusters currently
> > > using
> > > > > > these features, but if any issues arise we are there to react,
> and
> > > > there
> > > > > > are some plans as well to harden the system further. There are a
> > > couple
> > > > > of
> > > > > > items already planned after the soon to be released 0.6.0 you can
> > > check
> > > > > > into it in this JIRA (HDDS-3722)
> > > > > > <https://issues.apache.org/jira/browse/HDDS-3722>.
> > > > > >
> > > > > > If you have any questions feel free to ask further :)
> > > > > > Pifta
> > > > > >
> > > > > > Michel Sumbul <michelsum...@gmail.com> ezt írta (időpont: 2020.
> > júl.
> > > > 9.,
> > > > > > Cs, 12:57):
> > > > > >
> > > > > > > Hi guys,
> > > > > > >
> > > > > > > First thanks for your work on this project, it looks really
> great
> > > as
> > > > > the
> > > > > > > next evolution of HDFS (if I can say that :-) )
> > > > > > >
> > > > > > > I saw in multiple slideshows on the web that Ozone will support
> > > data
> > > > > > > locality like HDFS.
> > > > > > > What's the status of that? Is it already implemented?
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Michel
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Pifta
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to