On details about hdfs write process: https://blog.cloudera.com/understanding-hdfs-recovery-processes-part-1/
Em sex., 3 de jul. de 2020 às 15:21, Paul Carey <[email protected]> escreveu: > That's very helpful, many thanks. > > On Fri, Jul 3, 2020 at 2:36 PM 张铎(Duo Zhang) <[email protected]> > wrote: > > > > You can see my design doc for async dfs output > > > > > https://docs.google.com/document/d/11AyWtGhItQs6vsLRIx32PwTxmBY3libXwGXI25obVEY/edit#heading=h.2jvw6cxnmirr > > > > > > See the footnote below section 3.4. For the current HDFS pipeline > > implementation, it could be a problem for replication in HBase, though it > > rarely happens. > > > > And now HBase has its own AsyncFSWAL implementation, HBASE-14004 is used > to > > resolve the problem(although later we make things wrong and HBASE-24625 > is > > the fix). > > > > And for WAL recovery, it will not be a problem. We will only return > success > > to client after all the replicas have been successfully committed, so if > > DN2 goes offline, we will close the current file and commit it, and open > a > > new file to write WAL. > > > > Thanks. > > > > Paul Carey <[email protected]> 于2020年7月3日周五 下午7:40写道: > > > > > > If the hdfs write succeeded while u had only one DN available, then > the > > > other replica on the offline DN would be invalid now. > > > > > > Interesting, I wasn't aware of this. Are there any docs you could > > > point me towards where this is described? I've had a look in Hadoop: > > > The Definitive Guide and the official docs, but hadn't come across > > > this. > > > > > > On Fri, Jul 3, 2020 at 11:19 AM Wellington Chevreuil > > > <[email protected]> wrote: > > > > > > > > This is actually an hdfs consistency question, not hbase. If the hdfs > > > write > > > > succeeded while u had only one DN available, then the other replica > on > > > the > > > > offline DN would be invalid now. Then what u have is an under > replicated > > > > block, and of your only available DN goes offline before it could be > > > > replicated, the file that block belongs to now is corrupt. If I turn > on > > > the > > > > previous offline DN, it would still be corrupt as the replica it has > is > > > not > > > > valid anymore (NN knows which is the last valid version of the > replica), > > > so > > > > unless u can bring back the DN that has the only valid replica, your > > > hfilr > > > > is corrupt and your data is lost. > > > > > > > > On Fri, 3 Jul 2020, 09:12 Paul Carey, <[email protected]> > wrote: > > > > > > > > > Hi > > > > > > > > > > I'd like to understand how HBase deals with the situation where the > > > > > only available DataNodes for a given offline Region contain stale > > > > > data. Will HBase allow the Region to be brought online again, > > > > > effectively making the inconsistency permanent, or will it refuse > to > > > > > do so? > > > > > > > > > > My question is motivated from seeing how Kafka and Elasticsearch > > > > > handle this scenario. They both allow the inconsistency to become > > > > > permanent, Kafka via unclean leader election, and Elasticsearch via > > > > > the allocate_stale_primary command. > > > > > > > > > > To better understand my question, please consider the following > > > example: > > > > > > > > > > - HDFS is configured with `dfs.replication=2` and > > > > > `dfs.namenode.replication.min=1` > > > > > - DataNodes DN1 and DN2 contain the blocks for Region R1 > > > > > - DN2 goes offline > > > > > - R1 receives a writes which succeeds as it can be written > > > successfully to > > > > > DN1 > > > > > - DN1 goes offline before the NameNode can replicate the > > > > > under-replicated block containing the write to another DataNode > > > > > - At this point the R1 is offline > > > > > - DN2 comes back online, but it does not contain the missed write > > > > > > > > > > There are now two options: > > > > > > > > > > - R1 is brought back online, violating consistency > > > > > - R1 remains offline, indefinitely, until DN1 is brought back > online > > > > > > > > > > How does HBase deal with this situation? > > > > > > > > > > Many thanks > > > > > > > > > > Paul > > > > > > > > >
