Re: Re: Re: Re: What way to improve MTTR other than DLR(distributed log replay)

2016-10-31 Thread Enis Söztutar
inlined.

On Thu, Oct 27, 2016 at 3:01 PM, Stack  wrote:

> On Fri, Oct 21, 2016 at 3:24 PM, Enis Söztutar  wrote:
>
> > A bit late, but let me give my perspective. This can also be moved to
> jira
> > or dev@ I think.
> >
> > DLR was a nice and had pretty good gains for MTTR. However, dealing with
> > the sequence ids, onlining regions etc and the replay paths proved to be
> > too difficult in practice. I think the way forward would be to not bring
> > DLR back, but actually fix long standing log split problems.
> >
> > The main gains in DLR is that we do not create lots and lots of tiny
> files,
> > but instead rely on the regular region flushes, to flush bigger files.
> This
> > also helps with handling requests coming from different log files etc.
> The
> > only gain that I can think of that you get with DLR, but not with log
> split
> > is the online enabling of writes while the recovery is going on.
> However, I
> > think it is not worth having DLR just for this feature.
> >
> >
> And not having to write intermediary files as you note at the start of your
> paragraph.
>
>
>
> > Now, what are the problems with Log Split you ask. The problems are
> >   - we create a lot of tiny files
> >   - these tiny files are replayed sequentially when the region is
> assigned
> >   - The region has to replay and flush all data sequentially coming from
> > all these tiny files.
> >
> >
> Longest pole in MTTR used to be noticing the RS had gone away in the first
> place. Lets not forget to add this to our list.
>
>
>
> > In terms of IO, we pay the cost of reading original WAL files, and
> writing
> > this same amount of data into many small files where the NN overhead is
> > huge. Then for every region, we do serially sort the data by re-reading
> the
> > tiny WAL files (recovered edits) and sorting them in memory and flushing
> > the data. Which means we do 2 times the reads and writes that we should
> do
> > otherwise.
> >
> > The way to solve our log split bottlenecks is re-reading the big table
> > paper and implement the WAL recovery as described there.
> >  - Implement an HFile format that can contain data from multiple regions.
> > Something like a concatinated HFile format where each region has its own
> > section, with its own sequence id, etc.
>
>  - Implement links to these files where a link can refer to this data. This
> > is very similar to our ReferenceFile concept.
>
>  - In each log splitter task, instead of generating tiny WAL files that are
> > recovered edits, we instead buffer up in memory, and do a sort (this is
> the
> > same sort of inserting into the memstore) per region. A WAL is ~100 MB on
> > average, so should not be a big problem to buffer up this.
>
>
>
> Need to be able to spill. There will be anomalies.
>
>
>
> > At the end of
> > the WAL split task, write an hfile containing data from all the regions
> as
> > described above. Also do a multi NN request to create links in regions to
> > refer to these files (Not sure whether NN has a batch RPC call or not).
> >
> >
> It does not.
>
> So, doing an accounting, I see little difference from what we have now. In
> new scheme:
>
> + We read all WALs as before.
> + We write about the same (in current scheme, we'd aggregate
> across WAL so we didn't write a recovered edits file per WAL) though new
> scheme
>

current scheme is DLR or DLS (log split)? Compared to DLS, we will do read
WAL
once and write sorted HFiles once, instead of reading WAL twice, and
writing
small WALs (recovered.edits).


> maybe less since we currently flush after replay of recovered edits so we
> nail an
> hfile into the file system that has the recovered edits (but in new scheme,
> we'll bring
> on a compaction because we have references which will cause a rewrite of
> the big hfile
> into a smaller one...).
> + Metadata ops are about the same (rather than lots of small recovered
> edits files instead
> we write lots of small reference files)
>

Yes, I was assuming that we can do a batch createFiles() call in a single
RPC
reducing the NN overhead significantly. Lacking that, it is better to write
small
hfiles directly under region directory as Phil suggests above. Actually if
we do
hfile writes directly with spilling in regular log split, it will be a good
incremental
change.

If we have the file namespace (NN) in an hbase table (like meta) like we
have talked about
in the 1M regions jiras, we can gain a lot by writing a single hfile per
WAL.
The "hard links" will be cells in the meta table, so we can do a batch
write the regular
HBase way.


>
> ... only current scheme does distributed, paralellized sort and can spill
> if doesn't fit memory.
>

Doing spill should not be big problem as long as we bound the buffer space
per RS.
Read bunch of edits to an array of Cells-with-regions until buffer is full
or end of file. Sort in memory,
then write an HFile. Yes, with DLR, we are gaining by not spilling
per-region-per-WAL,
we would 

Re: Re: Re: Re: What way to improve MTTR other than DLR(distributed log replay)

2016-10-27 Thread Stack
On Thu, Oct 27, 2016 at 3:01 PM, Stack  wrote:

> On Fri, Oct 21, 2016 at 3:24 PM, Enis Söztutar  wrote:
>
>> A bit late, but let me give my perspective. This can also be moved to jira
>> or dev@ I think.
>>
>> DLR was a nice and had pretty good gains for MTTR. However, dealing with
>> the sequence ids, onlining regions etc and the replay paths proved to be
>> too difficult in practice. I think the way forward would be to not bring
>> DLR back, but actually fix long standing log split problems.
>>
>> The main gains in DLR is that we do not create lots and lots of tiny
>> files,
>> but instead rely on the regular region flushes, to flush bigger files.
>> This
>> also helps with handling requests coming from different log files etc. The
>> only gain that I can think of that you get with DLR, but not with log
>> split
>> is the online enabling of writes while the recovery is going on. However,
>> I
>> think it is not worth having DLR just for this feature.
>>
>>
> And not having to write intermediary files as you note at the start of
> your paragraph.
>
> I meant to say thanks for reviving this important topic.
St.Ack



>
>
>> Now, what are the problems with Log Split you ask. The problems are
>>   - we create a lot of tiny files
>>   - these tiny files are replayed sequentially when the region is assigned
>>   - The region has to replay and flush all data sequentially coming from
>> all these tiny files.
>>
>>
> Longest pole in MTTR used to be noticing the RS had gone away in the first
> place. Lets not forget to add this to our list.
>
>
>
>> In terms of IO, we pay the cost of reading original WAL files, and writing
>> this same amount of data into many small files where the NN overhead is
>> huge. Then for every region, we do serially sort the data by re-reading
>> the
>> tiny WAL files (recovered edits) and sorting them in memory and flushing
>> the data. Which means we do 2 times the reads and writes that we should do
>> otherwise.
>>
>> The way to solve our log split bottlenecks is re-reading the big table
>> paper and implement the WAL recovery as described there.
>>  - Implement an HFile format that can contain data from multiple regions.
>> Something like a concatinated HFile format where each region has its own
>> section, with its own sequence id, etc.
>
>  - Implement links to these files where a link can refer to this data. This
>> is very similar to our ReferenceFile concept.
>
>  - In each log splitter task, instead of generating tiny WAL files that are
>> recovered edits, we instead buffer up in memory, and do a sort (this is
>> the
>> same sort of inserting into the memstore) per region. A WAL is ~100 MB on
>> average, so should not be a big problem to buffer up this.
>
>
>
> Need to be able to spill. There will be anomalies.
>
>
>
>> At the end of
>> the WAL split task, write an hfile containing data from all the regions as
>> described above. Also do a multi NN request to create links in regions to
>> refer to these files (Not sure whether NN has a batch RPC call or not).
>>
>>
> It does not.
>
> So, doing an accounting, I see little difference from what we have now. In
> new scheme:
>
> + We read all WALs as before.
> + We write about the same (in current scheme, we'd aggregate
> across WAL so we didn't write a recovered edits file per WAL) though new
> scheme
> maybe less since we currently flush after replay of recovered edits so we
> nail an
> hfile into the file system that has the recovered edits (but in new
> scheme, we'll bring
> on a compaction because we have references which will cause a rewrite of
> the big hfile
> into a smaller one...).
> + Metadata ops are about the same (rather than lots of small recovered
> edits files instead
> we write lots of small reference files)
>
> ... only current scheme does distributed, paralellized sort and can spill
> if doesn't fit memory.
>
> Am I doing the math right here?
>
> Is there big improvement in MTTR? We are offline while we sort and write
> the big hfile and its
> references. We might save some because we just open the region after the
> above is done where
> now we have open and then replay recovered edits (though we could take
> writes in current
> scheme w/ a bit of work).
>
> Can we do better?
>
> St.Ack
>
>
>
>> The reason this will be on-par or better than DLR is that, we are only
>> doing 1 read and 1 write, and the sort is parallelized. The region opening
>> does not have to block on replaying anything or waiting for flush, because
>> the data is already sorted and in HFile format. These hfiles will be used
>> the normal way by adding them to the KVHeaps, etc. When compactions run,
>> we
>> will be removing the links to these files using the regular mechanisms.
>>
>> Enis
>>
>> On Tue, Oct 18, 2016 at 6:58 PM, Ted Yu  wrote:
>>
>> > Allan:
>> > One factor to consider is that the assignment manager in hbase 2.0
>> would be
>> > quite different from those in 0.98 and 

Re: Re: Re: Re: What way to improve MTTR other than DLR(distributed log replay)

2016-10-27 Thread Stack
On Fri, Oct 21, 2016 at 3:24 PM, Enis Söztutar  wrote:

> A bit late, but let me give my perspective. This can also be moved to jira
> or dev@ I think.
>
> DLR was a nice and had pretty good gains for MTTR. However, dealing with
> the sequence ids, onlining regions etc and the replay paths proved to be
> too difficult in practice. I think the way forward would be to not bring
> DLR back, but actually fix long standing log split problems.
>
> The main gains in DLR is that we do not create lots and lots of tiny files,
> but instead rely on the regular region flushes, to flush bigger files. This
> also helps with handling requests coming from different log files etc. The
> only gain that I can think of that you get with DLR, but not with log split
> is the online enabling of writes while the recovery is going on. However, I
> think it is not worth having DLR just for this feature.
>
>
And not having to write intermediary files as you note at the start of your
paragraph.



> Now, what are the problems with Log Split you ask. The problems are
>   - we create a lot of tiny files
>   - these tiny files are replayed sequentially when the region is assigned
>   - The region has to replay and flush all data sequentially coming from
> all these tiny files.
>
>
Longest pole in MTTR used to be noticing the RS had gone away in the first
place. Lets not forget to add this to our list.



> In terms of IO, we pay the cost of reading original WAL files, and writing
> this same amount of data into many small files where the NN overhead is
> huge. Then for every region, we do serially sort the data by re-reading the
> tiny WAL files (recovered edits) and sorting them in memory and flushing
> the data. Which means we do 2 times the reads and writes that we should do
> otherwise.
>
> The way to solve our log split bottlenecks is re-reading the big table
> paper and implement the WAL recovery as described there.
>  - Implement an HFile format that can contain data from multiple regions.
> Something like a concatinated HFile format where each region has its own
> section, with its own sequence id, etc.

 - Implement links to these files where a link can refer to this data. This
> is very similar to our ReferenceFile concept.

 - In each log splitter task, instead of generating tiny WAL files that are
> recovered edits, we instead buffer up in memory, and do a sort (this is the
> same sort of inserting into the memstore) per region. A WAL is ~100 MB on
> average, so should not be a big problem to buffer up this.



Need to be able to spill. There will be anomalies.



> At the end of
> the WAL split task, write an hfile containing data from all the regions as
> described above. Also do a multi NN request to create links in regions to
> refer to these files (Not sure whether NN has a batch RPC call or not).
>
>
It does not.

So, doing an accounting, I see little difference from what we have now. In
new scheme:

+ We read all WALs as before.
+ We write about the same (in current scheme, we'd aggregate
across WAL so we didn't write a recovered edits file per WAL) though new
scheme
maybe less since we currently flush after replay of recovered edits so we
nail an
hfile into the file system that has the recovered edits (but in new scheme,
we'll bring
on a compaction because we have references which will cause a rewrite of
the big hfile
into a smaller one...).
+ Metadata ops are about the same (rather than lots of small recovered
edits files instead
we write lots of small reference files)

... only current scheme does distributed, paralellized sort and can spill
if doesn't fit memory.

Am I doing the math right here?

Is there big improvement in MTTR? We are offline while we sort and write
the big hfile and its
references. We might save some because we just open the region after the
above is done where
now we have open and then replay recovered edits (though we could take
writes in current
scheme w/ a bit of work).

Can we do better?

St.Ack



> The reason this will be on-par or better than DLR is that, we are only
> doing 1 read and 1 write, and the sort is parallelized. The region opening
> does not have to block on replaying anything or waiting for flush, because
> the data is already sorted and in HFile format. These hfiles will be used
> the normal way by adding them to the KVHeaps, etc. When compactions run, we
> will be removing the links to these files using the regular mechanisms.
>
> Enis
>
> On Tue, Oct 18, 2016 at 6:58 PM, Ted Yu  wrote:
>
> > Allan:
> > One factor to consider is that the assignment manager in hbase 2.0 would
> be
> > quite different from those in 0.98 and 1.x branches.
> >
> > Meaning, you may need to come up with two solutions for a single problem.
> >
> > FYI
> >
> > On Tue, Oct 18, 2016 at 6:11 PM, Allan Yang  wrote:
> >
> > > Hi, Ted
> > > These issues I mentioned above(HBASE-13567, HBASE-12743, HBASE-13535,
> > > HBASE-14729) are ALL 

Re: Re: Re: Re: What way to improve MTTR other than DLR(distributed log replay)

2016-10-27 Thread Phil Yang
Hi all

We are also considering how to improve MTTR these days and have a plan. I
just notice this thread, hoping I am not late :)

My original thought is simple: Why we must put entries to MemStore of new
RS? We can read/write WAL entries only once in failover, from log entries
to HFiles, which is similar with Enis's thought. My idea is to flush them
to HFiles in their region's directory at the end of splitting, each HFile
for each region. And then assign the region to a RS. For the new RS it
doesn't replay any logs, it can open the region directly. The difference
from the new RS and the crashed RS is the new RS has one or some small
HFiles, and its MemStore is empty. If we have too many small HFiles we can
submit a minor compaction task but when we compact the region is already
online

Although this thought need writing many small HFiles rather than one big
HFile for each log file, now for ReferenceFile we also write a small file
containing region and split key to region's directory. Duo Zhang told me
this is because HDFS doesn't support hard link. So if we write them to many
HFiles the number of files in HDFS will not be increased. And if we write
entries to many HFiles, we can use current logic for all steps which is
simple to implement. For example, we can use a MemStore to save entries for
one region, and use flush logic to write it to the directory. We don't need
new HFile format and we can still check if we have split logs so we have
compatibility with old version, even if the RSs in cluster have different
split logic, we can still recover all regions.

Thanks,
Phil


2016-10-22 6:24 GMT+08:00 Enis Söztutar :

> A bit late, but let me give my perspective. This can also be moved to jira
> or dev@ I think.
>
> DLR was a nice and had pretty good gains for MTTR. However, dealing with
> the sequence ids, onlining regions etc and the replay paths proved to be
> too difficult in practice. I think the way forward would be to not bring
> DLR back, but actually fix long standing log split problems.
>
> The main gains in DLR is that we do not create lots and lots of tiny files,
> but instead rely on the regular region flushes, to flush bigger files. This
> also helps with handling requests coming from different log files etc. The
> only gain that I can think of that you get with DLR, but not with log split
> is the online enabling of writes while the recovery is going on. However, I
> think it is not worth having DLR just for this feature.
>
> Now, what are the problems with Log Split you ask. The problems are
>   - we create a lot of tiny files
>   - these tiny files are replayed sequentially when the region is assigned
>   - The region has to replay and flush all data sequentially coming from
> all these tiny files.
>
> In terms of IO, we pay the cost of reading original WAL files, and writing
> this same amount of data into many small files where the NN overhead is
> huge. Then for every region, we do serially sort the data by re-reading the
> tiny WAL files (recovered edits) and sorting them in memory and flushing
> the data. Which means we do 2 times the reads and writes that we should do
> otherwise.
>
> The way to solve our log split bottlenecks is re-reading the big table
> paper and implement the WAL recovery as described there.
>  - Implement an HFile format that can contain data from multiple regions.
> Something like a concatinated HFile format where each region has its own
> section, with its own sequence id, etc.
>  - Implement links to these files where a link can refer to this data. This
> is very similar to our ReferenceFile concept.
>  - In each log splitter task, instead of generating tiny WAL files that are
> recovered edits, we instead buffer up in memory, and do a sort (this is the
> same sort of inserting into the memstore) per region. A WAL is ~100 MB on
> average, so should not be a big problem to buffer up this. At the end of
> the WAL split task, write an hfile containing data from all the regions as
> described above. Also do a multi NN request to create links in regions to
> refer to these files (Not sure whether NN has a batch RPC call or not).
>
> The reason this will be on-par or better than DLR is that, we are only
> doing 1 read and 1 write, and the sort is parallelized. The region opening
> does not have to block on replaying anything or waiting for flush, because
> the data is already sorted and in HFile format. These hfiles will be used
> the normal way by adding them to the KVHeaps, etc. When compactions run, we
> will be removing the links to these files using the regular mechanisms.
>
> Enis
>
> On Tue, Oct 18, 2016 at 6:58 PM, Ted Yu  wrote:
>
> > Allan:
> > One factor to consider is that the assignment manager in hbase 2.0 would
> be
> > quite different from those in 0.98 and 1.x branches.
> >
> > Meaning, you may need to come up with two solutions for a single problem.
> >
> > FYI
> >
> > On Tue, Oct 18, 2016 at 6:11 PM, 

Re: Re: Re: Re: What way to improve MTTR other than DLR(distributed log replay)

2016-10-21 Thread Enis Söztutar
A bit late, but let me give my perspective. This can also be moved to jira
or dev@ I think.

DLR was a nice and had pretty good gains for MTTR. However, dealing with
the sequence ids, onlining regions etc and the replay paths proved to be
too difficult in practice. I think the way forward would be to not bring
DLR back, but actually fix long standing log split problems.

The main gains in DLR is that we do not create lots and lots of tiny files,
but instead rely on the regular region flushes, to flush bigger files. This
also helps with handling requests coming from different log files etc. The
only gain that I can think of that you get with DLR, but not with log split
is the online enabling of writes while the recovery is going on. However, I
think it is not worth having DLR just for this feature.

Now, what are the problems with Log Split you ask. The problems are
  - we create a lot of tiny files
  - these tiny files are replayed sequentially when the region is assigned
  - The region has to replay and flush all data sequentially coming from
all these tiny files.

In terms of IO, we pay the cost of reading original WAL files, and writing
this same amount of data into many small files where the NN overhead is
huge. Then for every region, we do serially sort the data by re-reading the
tiny WAL files (recovered edits) and sorting them in memory and flushing
the data. Which means we do 2 times the reads and writes that we should do
otherwise.

The way to solve our log split bottlenecks is re-reading the big table
paper and implement the WAL recovery as described there.
 - Implement an HFile format that can contain data from multiple regions.
Something like a concatinated HFile format where each region has its own
section, with its own sequence id, etc.
 - Implement links to these files where a link can refer to this data. This
is very similar to our ReferenceFile concept.
 - In each log splitter task, instead of generating tiny WAL files that are
recovered edits, we instead buffer up in memory, and do a sort (this is the
same sort of inserting into the memstore) per region. A WAL is ~100 MB on
average, so should not be a big problem to buffer up this. At the end of
the WAL split task, write an hfile containing data from all the regions as
described above. Also do a multi NN request to create links in regions to
refer to these files (Not sure whether NN has a batch RPC call or not).

The reason this will be on-par or better than DLR is that, we are only
doing 1 read and 1 write, and the sort is parallelized. The region opening
does not have to block on replaying anything or waiting for flush, because
the data is already sorted and in HFile format. These hfiles will be used
the normal way by adding them to the KVHeaps, etc. When compactions run, we
will be removing the links to these files using the regular mechanisms.

Enis

On Tue, Oct 18, 2016 at 6:58 PM, Ted Yu  wrote:

> Allan:
> One factor to consider is that the assignment manager in hbase 2.0 would be
> quite different from those in 0.98 and 1.x branches.
>
> Meaning, you may need to come up with two solutions for a single problem.
>
> FYI
>
> On Tue, Oct 18, 2016 at 6:11 PM, Allan Yang  wrote:
>
> > Hi, Ted
> > These issues I mentioned above(HBASE-13567, HBASE-12743, HBASE-13535,
> > HBASE-14729) are ALL reproduced in our HBase1.x test environment. Fixing
> > them is exactly what I'm going to do. I haven't found the root cause yet,
> > but  I will update if I find solutions.
> >  what I afraid is that, there are other issues I don't know yet. So if
> you
> > or other guys know other issues related to DLR, please let me know
> >
> >
> > Regards
> > Allan Yang
> >
> >
> >
> >
> >
> >
> >
> > At 2016-10-19 00:19:06, "Ted Yu"  wrote:
> > >Allan:
> > >I wonder how you deal with open issues such as HBASE-13535.
> > >From your description, it seems your team fixed more DLR issues.
> > >
> > >Cheers
> > >
> > >On Mon, Oct 17, 2016 at 11:37 PM, allanwin  wrote:
> > >
> > >>
> > >>
> > >>
> > >> Here is the thing. We have backported DLR(HBASE-7006) to our 0.94
> > >> clusters  in production environment(of course a lot of bugs are fixed
> > and
> > >> it is working well). It is was proven to be a huge gain. When a large
> > >> cluster crash down, the MTTR improved from several hours to less than
> a
> > >> hour. Now, we want to move on to HBase1.x, and still we want DLR. This
> > >> time, we don't want to backport the 'backported' DLR to HBase1.x, but
> it
> > >> seems like that the community have determined to remove DLR...
> > >>
> > >>
> > >> The DLR feature is proven useful in our production environment, so I
> > think
> > >> I will try to fix its issues in branch-1.x
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> At 2016-10-18 13:47:17, "Anoop John"  wrote:
> > >> >Agree with ur observation.. But DLR feature we wanted to get
> removed..
> > >> >Because it 

Re: Re: Re: Re: What way to improve MTTR other than DLR(distributed log replay)

2016-10-18 Thread Ted Yu
Allan:
One factor to consider is that the assignment manager in hbase 2.0 would be
quite different from those in 0.98 and 1.x branches.

Meaning, you may need to come up with two solutions for a single problem.

FYI

On Tue, Oct 18, 2016 at 6:11 PM, Allan Yang  wrote:

> Hi, Ted
> These issues I mentioned above(HBASE-13567, HBASE-12743, HBASE-13535,
> HBASE-14729) are ALL reproduced in our HBase1.x test environment. Fixing
> them is exactly what I'm going to do. I haven't found the root cause yet,
> but  I will update if I find solutions.
>  what I afraid is that, there are other issues I don't know yet. So if you
> or other guys know other issues related to DLR, please let me know
>
>
> Regards
> Allan Yang
>
>
>
>
>
>
>
> At 2016-10-19 00:19:06, "Ted Yu"  wrote:
> >Allan:
> >I wonder how you deal with open issues such as HBASE-13535.
> >From your description, it seems your team fixed more DLR issues.
> >
> >Cheers
> >
> >On Mon, Oct 17, 2016 at 11:37 PM, allanwin  wrote:
> >
> >>
> >>
> >>
> >> Here is the thing. We have backported DLR(HBASE-7006) to our 0.94
> >> clusters  in production environment(of course a lot of bugs are fixed
> and
> >> it is working well). It is was proven to be a huge gain. When a large
> >> cluster crash down, the MTTR improved from several hours to less than a
> >> hour. Now, we want to move on to HBase1.x, and still we want DLR. This
> >> time, we don't want to backport the 'backported' DLR to HBase1.x, but it
> >> seems like that the community have determined to remove DLR...
> >>
> >>
> >> The DLR feature is proven useful in our production environment, so I
> think
> >> I will try to fix its issues in branch-1.x
> >>
> >>
> >>
> >>
> >>
> >>
> >> At 2016-10-18 13:47:17, "Anoop John"  wrote:
> >> >Agree with ur observation.. But DLR feature we wanted to get removed..
> >> >Because it is known to have issues..  Or else we need major work to
> >> >correct all these issues.
> >> >
> >> >-Anoop-
> >> >
> >> >On Tue, Oct 18, 2016 at 7:41 AM, Ted Yu  wrote:
> >> >> If you have a cluster, I suggest you turn on DLR and observe the
> effect
> >> >> where fewer than half the region servers are up after the crash.
> >> >> You would have first hand experience that way.
> >> >>
> >> >> On Mon, Oct 17, 2016 at 6:33 PM, allanwin  wrote:
> >> >>
> >> >>>
> >> >>>
> >> >>>
> >> >>> Yes, region replica is a good way to improve MTTR. Specially if one
> or
> >> two
> >> >>> servers are down, region replica can improve data availability. But
> >> for big
> >> >>> disaster like 1/3 or 1/2 region servers shutdown, I think DLR still
> >> useful
> >> >>> to bring regions online more quickly and with less IO usage.
> >> >>>
> >> >>>
> >> >>> Regards
> >> >>> Allan Yang
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>> At 2016-10-17 21:01:16, "Ted Yu"  wrote:
> >> >>> >Here was the thread discussing DLR:
> >> >>> >
> >> >>> >http://search-hadoop.com/m/YGbbOxBK2n4ES12=Re+
> >> >>> DISCUSS+retiring+current+DLR+code
> >> >>> >
> >> >>> >> On Oct 17, 2016, at 4:15 AM, allanwin  wrote:
> >> >>> >>
> >> >>> >> Hi, All
> >> >>> >>  DLR can improve MTTR dramatically, but since it have many bugs
> like
> >> >>> HBASE-13567, HBASE-12743, HBASE-13535, HBASE-14729(any more I'don't
> >> know?),
> >> >>> it was proved unreliable, and has been deprecated almost in all
> >> branches
> >> >>> now.
> >> >>> >>
> >> >>> >>
> >> >>> >> My question is, is there any other way other than DLR to improve
> >> MTTR?
> >> >>> 'Cause If a big cluster crashes, It takes a long time to bring
> regions
> >> >>> online, not to mention it will create huge pressure on the IOs.
> >> >>> >>
> >> >>> >>
> >> >>> >> To tell the truth, I still want DLR back, if the community don't
> >> have
> >> >>> any plan to bring back DLR, I may want to figure out the problems in
> >> DLR
> >> >>> and make it working and reliable, Any suggests for that?
> >> >>> >>
> >> >>> >>
> >> >>> >> sincerely
> >> >>> >> Allan Yang
> >> >>>
> >>
>


Re: Re: Re: What way to improve MTTR other than DLR(distributed log replay)

2016-10-18 Thread Ted Yu
Allan:
I wonder how you deal with open issues such as HBASE-13535.
>From your description, it seems your team fixed more DLR issues.

Cheers

On Mon, Oct 17, 2016 at 11:37 PM, allanwin  wrote:

>
>
>
> Here is the thing. We have backported DLR(HBASE-7006) to our 0.94
> clusters  in production environment(of course a lot of bugs are fixed and
> it is working well). It is was proven to be a huge gain. When a large
> cluster crash down, the MTTR improved from several hours to less than a
> hour. Now, we want to move on to HBase1.x, and still we want DLR. This
> time, we don't want to backport the 'backported' DLR to HBase1.x, but it
> seems like that the community have determined to remove DLR...
>
>
> The DLR feature is proven useful in our production environment, so I think
> I will try to fix its issues in branch-1.x
>
>
>
>
>
>
> At 2016-10-18 13:47:17, "Anoop John"  wrote:
> >Agree with ur observation.. But DLR feature we wanted to get removed..
> >Because it is known to have issues..  Or else we need major work to
> >correct all these issues.
> >
> >-Anoop-
> >
> >On Tue, Oct 18, 2016 at 7:41 AM, Ted Yu  wrote:
> >> If you have a cluster, I suggest you turn on DLR and observe the effect
> >> where fewer than half the region servers are up after the crash.
> >> You would have first hand experience that way.
> >>
> >> On Mon, Oct 17, 2016 at 6:33 PM, allanwin  wrote:
> >>
> >>>
> >>>
> >>>
> >>> Yes, region replica is a good way to improve MTTR. Specially if one or
> two
> >>> servers are down, region replica can improve data availability. But
> for big
> >>> disaster like 1/3 or 1/2 region servers shutdown, I think DLR still
> useful
> >>> to bring regions online more quickly and with less IO usage.
> >>>
> >>>
> >>> Regards
> >>> Allan Yang
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> At 2016-10-17 21:01:16, "Ted Yu"  wrote:
> >>> >Here was the thread discussing DLR:
> >>> >
> >>> >http://search-hadoop.com/m/YGbbOxBK2n4ES12=Re+
> >>> DISCUSS+retiring+current+DLR+code
> >>> >
> >>> >> On Oct 17, 2016, at 4:15 AM, allanwin  wrote:
> >>> >>
> >>> >> Hi, All
> >>> >>  DLR can improve MTTR dramatically, but since it have many bugs like
> >>> HBASE-13567, HBASE-12743, HBASE-13535, HBASE-14729(any more I'don't
> know?),
> >>> it was proved unreliable, and has been deprecated almost in all
> branches
> >>> now.
> >>> >>
> >>> >>
> >>> >> My question is, is there any other way other than DLR to improve
> MTTR?
> >>> 'Cause If a big cluster crashes, It takes a long time to bring regions
> >>> online, not to mention it will create huge pressure on the IOs.
> >>> >>
> >>> >>
> >>> >> To tell the truth, I still want DLR back, if the community don't
> have
> >>> any plan to bring back DLR, I may want to figure out the problems in
> DLR
> >>> and make it working and reliable, Any suggests for that?
> >>> >>
> >>> >>
> >>> >> sincerely
> >>> >> Allan Yang
> >>>
>