I think the order will matter if you run with say replication factor 2. - Dave
On Thu, Apr 4, 2013 at 11:30 AM, lars hofhansl <[email protected]> wrote: > >> When the write request returns to the client there will be a local > copy, a copy on another machine in the same, and a copy on a machine in a > different rack, who cares about the ordering inside the pipeline? > > Not necessary. There might not be any additional copy on a different > > machine on the same rack. BUT.. As you said, who cares ;) As long as > > we have the local copy and some replicas. > > Really? Doesn't the whole pipeline have to be successful in order to > return success to the client. > (I might be confused :) ) > > > > ________________________________ > From: Jean-Marc Spaggiari <[email protected]> > To: [email protected]; lars hofhansl <[email protected]> > Sent: Thursday, April 4, 2013 11:24 AM > Subject: Re: confused info about region-regionserver locality > > >Isn't this done via pipelining anyway? > Yes, it's the way it's done. > > >So there's no notion of ordering with respect 1st, 2nd, and 3rd block, > either all writes go through the pipeline or none are. > Still correct. > > > When the write request returns to the client there will be a local copy, > a copy on another machine in the same, and a copy on a machine in a > different rack, who cares about the ordering inside the pipeline? > Not necessary. There might not be any additional copy on a different > machine on the same rack. BUT.. As you said, who cares ;) As long as > we have the local copy and some replicas. > > I have updated the documentation already. I will open the JIRA and > submit. I have also added subsequent replicas in case replication > factor is > 3. > > JM > > 2013/4/4 lars hofhansl <[email protected]>: > > Isn't this done via pipelining anyway? > > So there's no notion of ordering with respect 1st, 2nd, and 3rd block, > either all writes go through the pipeline or none are. > > > > When the write request returns to the client there will be a local copy, > a copy on another machine in the same, and a copy on a machine in a > different rack, who cares about the ordering inside the pipeline? > > > > > > Seems it would also be inefficient to pipeline from the local rack to > another another one and then in the same pipeline back into the local rack > (more load on the switch connecting the racks with no benefit). > > > > I'll double check. > > > > > > -- Lars > > > > > > > > ________________________________ > > From: Jean-Marc Spaggiari <[email protected]> > > To: [email protected] > > Sent: Thursday, April 4, 2013 8:25 AM > > Subject: Re: confused info about region-regionserver locality > > > > > > Hi, > > > > I think you're right and documentation need to be updated. > > > > The 3rd replica is written on a random node in the same rack as the > > 2nd replica. I will double check. Can you please open a JIRA so this > > is updated? > > > > JM > > > > 2013/4/4 KIM JUN YOUNG <[email protected]>: > >> Hi All. > >> > >> There is confused understanding about region-regionser locality. > >> > >> from the current document , > >> > >> http://hbase.apache.org/book/regions.arch.html > >> 9.7.3. Region-RegionServer Locality > >> Over time, Region-RegionServer locality is achieved via HDFS block > replication. The HDFS client does the following by default when choosing > locations to write replicas: > >> > >> First replica is written to local node > >> Second replica is written to another node in same rack > >> Third replica is written to a node in another rack (if sufficient nodes) > >> > >> > >> but, my understanding is different > >> HDFS write blocks for replica > >> > >> first, local node > >> second, another node in another rack > >> third, random another node in same rack > >> > >> need to be changed? or am I missing something? >
