Hi, Ted These issues I mentioned above(HBASE-13567, HBASE-12743, HBASE-13535, HBASE-14729) are ALL reproduced in our HBase1.x test environment. Fixing them is exactly what I'm going to do. I haven't found the root cause yet, but I will update if I find solutions. what I afraid is that, there are other issues I don't know yet. So if you or other guys know other issues related to DLR, please let me know
Regards Allan Yang At 2016-10-19 00:19:06, "Ted Yu" <[email protected]> wrote: >Allan: >I wonder how you deal with open issues such as HBASE-13535. >From your description, it seems your team fixed more DLR issues. > >Cheers > >On Mon, Oct 17, 2016 at 11:37 PM, allanwin <[email protected]> wrote: > >> >> >> >> Here is the thing. We have backported DLR(HBASE-7006) to our 0.94 >> clusters in production environment(of course a lot of bugs are fixed and >> it is working well). It is was proven to be a huge gain. When a large >> cluster crash down, the MTTR improved from several hours to less than a >> hour. Now, we want to move on to HBase1.x, and still we want DLR. This >> time, we don't want to backport the 'backported' DLR to HBase1.x, but it >> seems like that the community have determined to remove DLR... >> >> >> The DLR feature is proven useful in our production environment, so I think >> I will try to fix its issues in branch-1.x >> >> >> >> >> >> >> At 2016-10-18 13:47:17, "Anoop John" <[email protected]> wrote: >> >Agree with ur observation.. But DLR feature we wanted to get removed.. >> >Because it is known to have issues.. Or else we need major work to >> >correct all these issues. >> > >> >-Anoop- >> > >> >On Tue, Oct 18, 2016 at 7:41 AM, Ted Yu <[email protected]> wrote: >> >> If you have a cluster, I suggest you turn on DLR and observe the effect >> >> where fewer than half the region servers are up after the crash. >> >> You would have first hand experience that way. >> >> >> >> On Mon, Oct 17, 2016 at 6:33 PM, allanwin <[email protected]> wrote: >> >> >> >>> >> >>> >> >>> >> >>> Yes, region replica is a good way to improve MTTR. Specially if one or >> two >> >>> servers are down, region replica can improve data availability. But >> for big >> >>> disaster like 1/3 or 1/2 region servers shutdown, I think DLR still >> useful >> >>> to bring regions online more quickly and with less IO usage. >> >>> >> >>> >> >>> Regards >> >>> Allan Yang >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> At 2016-10-17 21:01:16, "Ted Yu" <[email protected]> wrote: >> >>> >Here was the thread discussing DLR: >> >>> > >> >>> >http://search-hadoop.com/m/YGbbOxBK2n4ES12&subj=Re+ >> >>> DISCUSS+retiring+current+DLR+code >> >>> > >> >>> >> On Oct 17, 2016, at 4:15 AM, allanwin <[email protected]> wrote: >> >>> >> >> >>> >> Hi, All >> >>> >> DLR can improve MTTR dramatically, but since it have many bugs like >> >>> HBASE-13567, HBASE-12743, HBASE-13535, HBASE-14729(any more I'don't >> know?), >> >>> it was proved unreliable, and has been deprecated almost in all >> branches >> >>> now. >> >>> >> >> >>> >> >> >>> >> My question is, is there any other way other than DLR to improve >> MTTR? >> >>> 'Cause If a big cluster crashes, It takes a long time to bring regions >> >>> online, not to mention it will create huge pressure on the IOs. >> >>> >> >> >>> >> >> >>> >> To tell the truth, I still want DLR back, if the community don't >> have >> >>> any plan to bring back DLR, I may want to figure out the problems in >> DLR >> >>> and make it working and reliable, Any suggests for that? >> >>> >> >> >>> >> >> >>> >> sincerely >> >>> >> Allan Yang >> >>> >>
