Re: [DISCUSS] Hbase Backup design changes

2021-07-25 Thread Mallikarjun
Thanks Duo.

---
Mallikarjun


On Sun, Jul 25, 2021 at 7:32 PM 张铎(Duo Zhang)  wrote:

> Replied on jira. Please give more details about what you are doing in the
> PR...
>
> Thanks.
>
> Mallikarjun  于2021年7月25日周日 上午10:48写道:
> >
> > Can someone review this pull request?
> > https://github.com/apache/hbase/pull/3359
> >
> > This change changes meta information for backup, if not part of hbase
> > 3.0.0. It might have a lot of additional work to be put into executing
> the
> > above mentioned plan.
> >
> > ---
> > Mallikarjun
> >
> >
> > On Thu, Feb 11, 2021 at 5:36 PM Mallikarjun 
> > wrote:
> >
> > > Slight modification to previous version --> https://ibb.co/Nttx3J1
> > >
> > > ---
> > > Mallikarjun
> > >
> > >
> > > On Thu, Feb 11, 2021 at 8:12 AM Mallikarjun 
> > > wrote:
> > >
> > >> Inline Reply
> > >>
> > >> On Wed, Feb 3, 2021 at 6:44 AM Sean Busbey  wrote:
> > >>
> > >>> Hi Mallikarjun,
> > >>>
> > >>> Those goals sound worthwhile.
> > >>>
> > >>> Do you have a flow chart similar to the one you posted for the
> current
> > >>> system but for the proposed solution?
> > >>>
> > >>
> > >> This is what I am thinking --> https://ibb.co/KmH6Cwv
> > >>
> > >>
> > >>>
> > >>> How much will we need to change our existing test coverage to
> accommodate
> > >>> the proposed solution?
> > >>>
> > >>
> > >> Of the 38 tests, it looks like we might have to change a couple only.
> > >> Will have to add more tests to cover parallel backup scenarios.
> > >>
> > >>
> > >>>
> > >>> How much will we need to update the existing reference guide section?
> > >>>
> > >>>
> > >> Probably nothing. Interface as such will not change.
> > >>
> > >>
> > >>>
> > >>> On Sun, Jan 31, 2021, 04:59 Mallikarjun 
> > >>> wrote:
> > >>>
> > >>> > Bringing up this thread.
> > >>> >
> > >>> > On Mon, Jan 25, 2021, 3:38 PM Viraj Jasani 
> wrote:
> > >>> >
> > >>> > > Thanks, the image is visible now.
> > >>> > >
> > >>> > > > Since I wanted to open this for discussion, did not consider
> > >>> placing it
> > >>> > > in
> > >>> > > *hbase/dev_support/design-docs*.
> > >>> > >
> > >>> > > Definitely, only after we come to concrete conclusion with the
> > >>> reviewer,
> > >>> > we
> > >>> > > should open up a PR. Until then this thread is anyways up for
> > >>> discussion.
> > >>> > >
> > >>> > >
> > >>> > > On Mon, 25 Jan 2021 at 1:58 PM, Mallikarjun <
> > >>> mallik.v.ar...@gmail.com>
> > >>> > > wrote:
> > >>> > >
> > >>> > > > Hope this link works --> https://ibb.co/hYjRpgP
> > >>> > > >
> > >>> > > > Inline reply
> > >>> > > > On Mon, Jan 25, 2021 at 1:16 PM Viraj Jasani <
> vjas...@apache.org>
> > >>> > wrote:
> > >>> > > >
> > >>> > > > > Hi,
> > >>> > > > >
> > >>> > > > > Still not available :)
> > >>> > > > > The attachments don’t work on mailing lists. You can try
> > >>> uploading
> > >>> > the
> > >>> > > > > attachment on some public hosting site and provide the url
> to the
> > >>> > same
> > >>> > > > > here.
> > >>> > > > >
> > >>> > > > > Since I am not aware of the contents, I cannot confirm right
> > >>> away but
> > >>> > > if
> > >>> > > > > the reviewer feels we should have the attachment on our
> github
> > >>> repo:
> > >>> > > > > hbase/dev-support/design-docs , good to upload the content
> there
> > >>> > later.
> > >>> > > > For
> > >>> > > > > instance, pdf file can contain existing design and new design
> > >>> > diagrams
> > >>> > > > and
> > >>> > > > > talk about pros and cons etc once we have things finalized.
> > >>> > > > >
> > >>> > > > >
> > >>> > > > Since I wanted to open this for discussion, did not consider
> > >>> placing it
> > >>> > > in
> > >>> > > > *hbase/dev_support/design-docs*.
> > >>> > > >
> > >>> > > >
> > >>> > > > >
> > >>> > > > > On Mon, 25 Jan 2021 at 12:13 PM, Mallikarjun <
> > >>> > mallik.v.ar...@gmail.com
> > >>> > > >
> > >>> > > > > wrote:
> > >>> > > > >
> > >>> > > > > > Attached as image. Please let me know if it is availabe
> now.
> > >>> > > > > >
> > >>> > > > > > ---
> > >>> > > > > > Mallikarjun
> > >>> > > > > >
> > >>> > > > > >
> > >>> > > > > > On Mon, Jan 25, 2021 at 10:32 AM Sean Busbey <
> > >>> bus...@apache.org>
> > >>> > > > wrote:
> > >>> > > > > >
> > >>> > > > > >> Hi!
> > >>> > > > > >>
> > >>> > > > > >> Thanks for the write up. unfortunately, your image for the
> > >>> > existing
> > >>> > > > > >> design didn't come through. Could you post it to some
> host and
> > >>> > link
> > >>> > > it
> > >>> > > > > >> here?
> > >>> > > > > >>
> > >>> > > > > >> On Sun, Jan 24, 2021 at 3:12 AM Mallikarjun <
> > >>> > > mallik.v.ar...@gmail.com
> > >>> > > > >
> > >>> > > > > >> wrote:
> > >>> > > > > >> >
> > >>> > > > > >> > Existing Design:
> > >>> > > > > >> >
> > >>> > > > > >> >
> > >>> > > > > >> >
> > >>> > > > > >> > Problem 1:
> > >>> > > > > >> >
> > >>> > > > > >> > With this design, Incremental and Full backup can't be
> run
> > >>> in
> > >>> > > > parallel
> > >>> > > > > >> and leading to degraded RPO's in case Full backup is of
> longer
> > >>> > > 

Re: [DISCUSS] Hbase Backup design changes

2021-07-25 Thread Duo Zhang
Replied on jira. Please give more details about what you are doing in the PR...

Thanks.

Mallikarjun  于2021年7月25日周日 上午10:48写道:
>
> Can someone review this pull request?
> https://github.com/apache/hbase/pull/3359
>
> This change changes meta information for backup, if not part of hbase
> 3.0.0. It might have a lot of additional work to be put into executing the
> above mentioned plan.
>
> ---
> Mallikarjun
>
>
> On Thu, Feb 11, 2021 at 5:36 PM Mallikarjun 
> wrote:
>
> > Slight modification to previous version --> https://ibb.co/Nttx3J1
> >
> > ---
> > Mallikarjun
> >
> >
> > On Thu, Feb 11, 2021 at 8:12 AM Mallikarjun 
> > wrote:
> >
> >> Inline Reply
> >>
> >> On Wed, Feb 3, 2021 at 6:44 AM Sean Busbey  wrote:
> >>
> >>> Hi Mallikarjun,
> >>>
> >>> Those goals sound worthwhile.
> >>>
> >>> Do you have a flow chart similar to the one you posted for the current
> >>> system but for the proposed solution?
> >>>
> >>
> >> This is what I am thinking --> https://ibb.co/KmH6Cwv
> >>
> >>
> >>>
> >>> How much will we need to change our existing test coverage to accommodate
> >>> the proposed solution?
> >>>
> >>
> >> Of the 38 tests, it looks like we might have to change a couple only.
> >> Will have to add more tests to cover parallel backup scenarios.
> >>
> >>
> >>>
> >>> How much will we need to update the existing reference guide section?
> >>>
> >>>
> >> Probably nothing. Interface as such will not change.
> >>
> >>
> >>>
> >>> On Sun, Jan 31, 2021, 04:59 Mallikarjun 
> >>> wrote:
> >>>
> >>> > Bringing up this thread.
> >>> >
> >>> > On Mon, Jan 25, 2021, 3:38 PM Viraj Jasani  wrote:
> >>> >
> >>> > > Thanks, the image is visible now.
> >>> > >
> >>> > > > Since I wanted to open this for discussion, did not consider
> >>> placing it
> >>> > > in
> >>> > > *hbase/dev_support/design-docs*.
> >>> > >
> >>> > > Definitely, only after we come to concrete conclusion with the
> >>> reviewer,
> >>> > we
> >>> > > should open up a PR. Until then this thread is anyways up for
> >>> discussion.
> >>> > >
> >>> > >
> >>> > > On Mon, 25 Jan 2021 at 1:58 PM, Mallikarjun <
> >>> mallik.v.ar...@gmail.com>
> >>> > > wrote:
> >>> > >
> >>> > > > Hope this link works --> https://ibb.co/hYjRpgP
> >>> > > >
> >>> > > > Inline reply
> >>> > > > On Mon, Jan 25, 2021 at 1:16 PM Viraj Jasani 
> >>> > wrote:
> >>> > > >
> >>> > > > > Hi,
> >>> > > > >
> >>> > > > > Still not available :)
> >>> > > > > The attachments don’t work on mailing lists. You can try
> >>> uploading
> >>> > the
> >>> > > > > attachment on some public hosting site and provide the url to the
> >>> > same
> >>> > > > > here.
> >>> > > > >
> >>> > > > > Since I am not aware of the contents, I cannot confirm right
> >>> away but
> >>> > > if
> >>> > > > > the reviewer feels we should have the attachment on our github
> >>> repo:
> >>> > > > > hbase/dev-support/design-docs , good to upload the content there
> >>> > later.
> >>> > > > For
> >>> > > > > instance, pdf file can contain existing design and new design
> >>> > diagrams
> >>> > > > and
> >>> > > > > talk about pros and cons etc once we have things finalized.
> >>> > > > >
> >>> > > > >
> >>> > > > Since I wanted to open this for discussion, did not consider
> >>> placing it
> >>> > > in
> >>> > > > *hbase/dev_support/design-docs*.
> >>> > > >
> >>> > > >
> >>> > > > >
> >>> > > > > On Mon, 25 Jan 2021 at 12:13 PM, Mallikarjun <
> >>> > mallik.v.ar...@gmail.com
> >>> > > >
> >>> > > > > wrote:
> >>> > > > >
> >>> > > > > > Attached as image. Please let me know if it is availabe now.
> >>> > > > > >
> >>> > > > > > ---
> >>> > > > > > Mallikarjun
> >>> > > > > >
> >>> > > > > >
> >>> > > > > > On Mon, Jan 25, 2021 at 10:32 AM Sean Busbey <
> >>> bus...@apache.org>
> >>> > > > wrote:
> >>> > > > > >
> >>> > > > > >> Hi!
> >>> > > > > >>
> >>> > > > > >> Thanks for the write up. unfortunately, your image for the
> >>> > existing
> >>> > > > > >> design didn't come through. Could you post it to some host and
> >>> > link
> >>> > > it
> >>> > > > > >> here?
> >>> > > > > >>
> >>> > > > > >> On Sun, Jan 24, 2021 at 3:12 AM Mallikarjun <
> >>> > > mallik.v.ar...@gmail.com
> >>> > > > >
> >>> > > > > >> wrote:
> >>> > > > > >> >
> >>> > > > > >> > Existing Design:
> >>> > > > > >> >
> >>> > > > > >> >
> >>> > > > > >> >
> >>> > > > > >> > Problem 1:
> >>> > > > > >> >
> >>> > > > > >> > With this design, Incremental and Full backup can't be run
> >>> in
> >>> > > > parallel
> >>> > > > > >> and leading to degraded RPO's in case Full backup is of longer
> >>> > > > duration
> >>> > > > > esp
> >>> > > > > >> for large tables.
> >>> > > > > >> >
> >>> > > > > >> > Example:
> >>> > > > > >> > Expectation: Say you have a big table with 10 TB and your
> >>> RPO is
> >>> > > 60
> >>> > > > > >> minutes and you are allowed to ship the remote backup with 800
> >>> > Mbps.
> >>> > > > And
> >>> > > > > >> you are allowed to take Full Backups once in a week and rest
> >>> of
> >>> > them
> >>> > > > > should
> >>> > > > 

Re: [DISCUSS] Hbase Backup design changes

2021-07-24 Thread Mallikarjun
Can someone review this pull request?
https://github.com/apache/hbase/pull/3359

This change changes meta information for backup, if not part of hbase
3.0.0. It might have a lot of additional work to be put into executing the
above mentioned plan.

---
Mallikarjun


On Thu, Feb 11, 2021 at 5:36 PM Mallikarjun 
wrote:

> Slight modification to previous version --> https://ibb.co/Nttx3J1
>
> ---
> Mallikarjun
>
>
> On Thu, Feb 11, 2021 at 8:12 AM Mallikarjun 
> wrote:
>
>> Inline Reply
>>
>> On Wed, Feb 3, 2021 at 6:44 AM Sean Busbey  wrote:
>>
>>> Hi Mallikarjun,
>>>
>>> Those goals sound worthwhile.
>>>
>>> Do you have a flow chart similar to the one you posted for the current
>>> system but for the proposed solution?
>>>
>>
>> This is what I am thinking --> https://ibb.co/KmH6Cwv
>>
>>
>>>
>>> How much will we need to change our existing test coverage to accommodate
>>> the proposed solution?
>>>
>>
>> Of the 38 tests, it looks like we might have to change a couple only.
>> Will have to add more tests to cover parallel backup scenarios.
>>
>>
>>>
>>> How much will we need to update the existing reference guide section?
>>>
>>>
>> Probably nothing. Interface as such will not change.
>>
>>
>>>
>>> On Sun, Jan 31, 2021, 04:59 Mallikarjun 
>>> wrote:
>>>
>>> > Bringing up this thread.
>>> >
>>> > On Mon, Jan 25, 2021, 3:38 PM Viraj Jasani  wrote:
>>> >
>>> > > Thanks, the image is visible now.
>>> > >
>>> > > > Since I wanted to open this for discussion, did not consider
>>> placing it
>>> > > in
>>> > > *hbase/dev_support/design-docs*.
>>> > >
>>> > > Definitely, only after we come to concrete conclusion with the
>>> reviewer,
>>> > we
>>> > > should open up a PR. Until then this thread is anyways up for
>>> discussion.
>>> > >
>>> > >
>>> > > On Mon, 25 Jan 2021 at 1:58 PM, Mallikarjun <
>>> mallik.v.ar...@gmail.com>
>>> > > wrote:
>>> > >
>>> > > > Hope this link works --> https://ibb.co/hYjRpgP
>>> > > >
>>> > > > Inline reply
>>> > > > On Mon, Jan 25, 2021 at 1:16 PM Viraj Jasani 
>>> > wrote:
>>> > > >
>>> > > > > Hi,
>>> > > > >
>>> > > > > Still not available :)
>>> > > > > The attachments don’t work on mailing lists. You can try
>>> uploading
>>> > the
>>> > > > > attachment on some public hosting site and provide the url to the
>>> > same
>>> > > > > here.
>>> > > > >
>>> > > > > Since I am not aware of the contents, I cannot confirm right
>>> away but
>>> > > if
>>> > > > > the reviewer feels we should have the attachment on our github
>>> repo:
>>> > > > > hbase/dev-support/design-docs , good to upload the content there
>>> > later.
>>> > > > For
>>> > > > > instance, pdf file can contain existing design and new design
>>> > diagrams
>>> > > > and
>>> > > > > talk about pros and cons etc once we have things finalized.
>>> > > > >
>>> > > > >
>>> > > > Since I wanted to open this for discussion, did not consider
>>> placing it
>>> > > in
>>> > > > *hbase/dev_support/design-docs*.
>>> > > >
>>> > > >
>>> > > > >
>>> > > > > On Mon, 25 Jan 2021 at 12:13 PM, Mallikarjun <
>>> > mallik.v.ar...@gmail.com
>>> > > >
>>> > > > > wrote:
>>> > > > >
>>> > > > > > Attached as image. Please let me know if it is availabe now.
>>> > > > > >
>>> > > > > > ---
>>> > > > > > Mallikarjun
>>> > > > > >
>>> > > > > >
>>> > > > > > On Mon, Jan 25, 2021 at 10:32 AM Sean Busbey <
>>> bus...@apache.org>
>>> > > > wrote:
>>> > > > > >
>>> > > > > >> Hi!
>>> > > > > >>
>>> > > > > >> Thanks for the write up. unfortunately, your image for the
>>> > existing
>>> > > > > >> design didn't come through. Could you post it to some host and
>>> > link
>>> > > it
>>> > > > > >> here?
>>> > > > > >>
>>> > > > > >> On Sun, Jan 24, 2021 at 3:12 AM Mallikarjun <
>>> > > mallik.v.ar...@gmail.com
>>> > > > >
>>> > > > > >> wrote:
>>> > > > > >> >
>>> > > > > >> > Existing Design:
>>> > > > > >> >
>>> > > > > >> >
>>> > > > > >> >
>>> > > > > >> > Problem 1:
>>> > > > > >> >
>>> > > > > >> > With this design, Incremental and Full backup can't be run
>>> in
>>> > > > parallel
>>> > > > > >> and leading to degraded RPO's in case Full backup is of longer
>>> > > > duration
>>> > > > > esp
>>> > > > > >> for large tables.
>>> > > > > >> >
>>> > > > > >> > Example:
>>> > > > > >> > Expectation: Say you have a big table with 10 TB and your
>>> RPO is
>>> > > 60
>>> > > > > >> minutes and you are allowed to ship the remote backup with 800
>>> > Mbps.
>>> > > > And
>>> > > > > >> you are allowed to take Full Backups once in a week and rest
>>> of
>>> > them
>>> > > > > should
>>> > > > > >> be incremental backups
>>> > > > > >> >
>>> > > > > >> > Shortcoming: With the above design, one can't run parallel
>>> > backups
>>> > > > and
>>> > > > > >> whenever there is a full backup running (which takes roughly
>>> 25
>>> > > hours)
>>> > > > > you
>>> > > > > >> are not allowed to take incremental backups and that would be
>>> a
>>> > > breach
>>> > > > > in
>>> > > > > >> your RPO.
>>> > > > > >> >
>>> > > > > >> > Proposed Solution: Barring some 

Re: [DISCUSS] Hbase Backup design changes

2021-02-11 Thread Mallikarjun
Slight modification to previous version --> https://ibb.co/Nttx3J1

---
Mallikarjun


On Thu, Feb 11, 2021 at 8:12 AM Mallikarjun 
wrote:

> Inline Reply
>
> On Wed, Feb 3, 2021 at 6:44 AM Sean Busbey  wrote:
>
>> Hi Mallikarjun,
>>
>> Those goals sound worthwhile.
>>
>> Do you have a flow chart similar to the one you posted for the current
>> system but for the proposed solution?
>>
>
> This is what I am thinking --> https://ibb.co/KmH6Cwv
>
>
>>
>> How much will we need to change our existing test coverage to accommodate
>> the proposed solution?
>>
>
> Of the 38 tests, it looks like we might have to change a couple only.
> Will have to add more tests to cover parallel backup scenarios.
>
>
>>
>> How much will we need to update the existing reference guide section?
>>
>>
> Probably nothing. Interface as such will not change.
>
>
>>
>> On Sun, Jan 31, 2021, 04:59 Mallikarjun  wrote:
>>
>> > Bringing up this thread.
>> >
>> > On Mon, Jan 25, 2021, 3:38 PM Viraj Jasani  wrote:
>> >
>> > > Thanks, the image is visible now.
>> > >
>> > > > Since I wanted to open this for discussion, did not consider
>> placing it
>> > > in
>> > > *hbase/dev_support/design-docs*.
>> > >
>> > > Definitely, only after we come to concrete conclusion with the
>> reviewer,
>> > we
>> > > should open up a PR. Until then this thread is anyways up for
>> discussion.
>> > >
>> > >
>> > > On Mon, 25 Jan 2021 at 1:58 PM, Mallikarjun > >
>> > > wrote:
>> > >
>> > > > Hope this link works --> https://ibb.co/hYjRpgP
>> > > >
>> > > > Inline reply
>> > > > On Mon, Jan 25, 2021 at 1:16 PM Viraj Jasani 
>> > wrote:
>> > > >
>> > > > > Hi,
>> > > > >
>> > > > > Still not available :)
>> > > > > The attachments don’t work on mailing lists. You can try uploading
>> > the
>> > > > > attachment on some public hosting site and provide the url to the
>> > same
>> > > > > here.
>> > > > >
>> > > > > Since I am not aware of the contents, I cannot confirm right away
>> but
>> > > if
>> > > > > the reviewer feels we should have the attachment on our github
>> repo:
>> > > > > hbase/dev-support/design-docs , good to upload the content there
>> > later.
>> > > > For
>> > > > > instance, pdf file can contain existing design and new design
>> > diagrams
>> > > > and
>> > > > > talk about pros and cons etc once we have things finalized.
>> > > > >
>> > > > >
>> > > > Since I wanted to open this for discussion, did not consider
>> placing it
>> > > in
>> > > > *hbase/dev_support/design-docs*.
>> > > >
>> > > >
>> > > > >
>> > > > > On Mon, 25 Jan 2021 at 12:13 PM, Mallikarjun <
>> > mallik.v.ar...@gmail.com
>> > > >
>> > > > > wrote:
>> > > > >
>> > > > > > Attached as image. Please let me know if it is availabe now.
>> > > > > >
>> > > > > > ---
>> > > > > > Mallikarjun
>> > > > > >
>> > > > > >
>> > > > > > On Mon, Jan 25, 2021 at 10:32 AM Sean Busbey > >
>> > > > wrote:
>> > > > > >
>> > > > > >> Hi!
>> > > > > >>
>> > > > > >> Thanks for the write up. unfortunately, your image for the
>> > existing
>> > > > > >> design didn't come through. Could you post it to some host and
>> > link
>> > > it
>> > > > > >> here?
>> > > > > >>
>> > > > > >> On Sun, Jan 24, 2021 at 3:12 AM Mallikarjun <
>> > > mallik.v.ar...@gmail.com
>> > > > >
>> > > > > >> wrote:
>> > > > > >> >
>> > > > > >> > Existing Design:
>> > > > > >> >
>> > > > > >> >
>> > > > > >> >
>> > > > > >> > Problem 1:
>> > > > > >> >
>> > > > > >> > With this design, Incremental and Full backup can't be run in
>> > > > parallel
>> > > > > >> and leading to degraded RPO's in case Full backup is of longer
>> > > > duration
>> > > > > esp
>> > > > > >> for large tables.
>> > > > > >> >
>> > > > > >> > Example:
>> > > > > >> > Expectation: Say you have a big table with 10 TB and your
>> RPO is
>> > > 60
>> > > > > >> minutes and you are allowed to ship the remote backup with 800
>> > Mbps.
>> > > > And
>> > > > > >> you are allowed to take Full Backups once in a week and rest of
>> > them
>> > > > > should
>> > > > > >> be incremental backups
>> > > > > >> >
>> > > > > >> > Shortcoming: With the above design, one can't run parallel
>> > backups
>> > > > and
>> > > > > >> whenever there is a full backup running (which takes roughly 25
>> > > hours)
>> > > > > you
>> > > > > >> are not allowed to take incremental backups and that would be a
>> > > breach
>> > > > > in
>> > > > > >> your RPO.
>> > > > > >> >
>> > > > > >> > Proposed Solution: Barring some critical sections such as
>> > > modifying
>> > > > > >> state of the backup on meta tables, others can happen
>> parallelly.
>> > > > > Leaving
>> > > > > >> incremental backups to be able to run based on older successful
>> > > full /
>> > > > > >> incremental backups and completion time of backup should be
>> used
>> > > > > instead of
>> > > > > >> start time of backup for ordering. I have not worked on the
>> full
>> > > > > redesign,
>> > > > > >> and will be doing so if this proposal seems acceptable for the
>> > > > > community.
>> > > > > >> >
>> > > 

Re: [DISCUSS] Hbase Backup design changes

2021-02-10 Thread Mallikarjun
Inline Reply

On Wed, Feb 3, 2021 at 6:44 AM Sean Busbey  wrote:

> Hi Mallikarjun,
>
> Those goals sound worthwhile.
>
> Do you have a flow chart similar to the one you posted for the current
> system but for the proposed solution?
>

This is what I am thinking --> https://ibb.co/KmH6Cwv


>
> How much will we need to change our existing test coverage to accommodate
> the proposed solution?
>

Of the 38 tests, it looks like we might have to change a couple only.
Will have to add more tests to cover parallel backup scenarios.


>
> How much will we need to update the existing reference guide section?
>
>
Probably nothing. Interface as such will not change.


>
> On Sun, Jan 31, 2021, 04:59 Mallikarjun  wrote:
>
> > Bringing up this thread.
> >
> > On Mon, Jan 25, 2021, 3:38 PM Viraj Jasani  wrote:
> >
> > > Thanks, the image is visible now.
> > >
> > > > Since I wanted to open this for discussion, did not consider placing
> it
> > > in
> > > *hbase/dev_support/design-docs*.
> > >
> > > Definitely, only after we come to concrete conclusion with the
> reviewer,
> > we
> > > should open up a PR. Until then this thread is anyways up for
> discussion.
> > >
> > >
> > > On Mon, 25 Jan 2021 at 1:58 PM, Mallikarjun 
> > > wrote:
> > >
> > > > Hope this link works --> https://ibb.co/hYjRpgP
> > > >
> > > > Inline reply
> > > > On Mon, Jan 25, 2021 at 1:16 PM Viraj Jasani 
> > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > Still not available :)
> > > > > The attachments don’t work on mailing lists. You can try uploading
> > the
> > > > > attachment on some public hosting site and provide the url to the
> > same
> > > > > here.
> > > > >
> > > > > Since I am not aware of the contents, I cannot confirm right away
> but
> > > if
> > > > > the reviewer feels we should have the attachment on our github
> repo:
> > > > > hbase/dev-support/design-docs , good to upload the content there
> > later.
> > > > For
> > > > > instance, pdf file can contain existing design and new design
> > diagrams
> > > > and
> > > > > talk about pros and cons etc once we have things finalized.
> > > > >
> > > > >
> > > > Since I wanted to open this for discussion, did not consider placing
> it
> > > in
> > > > *hbase/dev_support/design-docs*.
> > > >
> > > >
> > > > >
> > > > > On Mon, 25 Jan 2021 at 12:13 PM, Mallikarjun <
> > mallik.v.ar...@gmail.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > Attached as image. Please let me know if it is availabe now.
> > > > > >
> > > > > > ---
> > > > > > Mallikarjun
> > > > > >
> > > > > >
> > > > > > On Mon, Jan 25, 2021 at 10:32 AM Sean Busbey 
> > > > wrote:
> > > > > >
> > > > > >> Hi!
> > > > > >>
> > > > > >> Thanks for the write up. unfortunately, your image for the
> > existing
> > > > > >> design didn't come through. Could you post it to some host and
> > link
> > > it
> > > > > >> here?
> > > > > >>
> > > > > >> On Sun, Jan 24, 2021 at 3:12 AM Mallikarjun <
> > > mallik.v.ar...@gmail.com
> > > > >
> > > > > >> wrote:
> > > > > >> >
> > > > > >> > Existing Design:
> > > > > >> >
> > > > > >> >
> > > > > >> >
> > > > > >> > Problem 1:
> > > > > >> >
> > > > > >> > With this design, Incremental and Full backup can't be run in
> > > > parallel
> > > > > >> and leading to degraded RPO's in case Full backup is of longer
> > > > duration
> > > > > esp
> > > > > >> for large tables.
> > > > > >> >
> > > > > >> > Example:
> > > > > >> > Expectation: Say you have a big table with 10 TB and your RPO
> is
> > > 60
> > > > > >> minutes and you are allowed to ship the remote backup with 800
> > Mbps.
> > > > And
> > > > > >> you are allowed to take Full Backups once in a week and rest of
> > them
> > > > > should
> > > > > >> be incremental backups
> > > > > >> >
> > > > > >> > Shortcoming: With the above design, one can't run parallel
> > backups
> > > > and
> > > > > >> whenever there is a full backup running (which takes roughly 25
> > > hours)
> > > > > you
> > > > > >> are not allowed to take incremental backups and that would be a
> > > breach
> > > > > in
> > > > > >> your RPO.
> > > > > >> >
> > > > > >> > Proposed Solution: Barring some critical sections such as
> > > modifying
> > > > > >> state of the backup on meta tables, others can happen
> parallelly.
> > > > > Leaving
> > > > > >> incremental backups to be able to run based on older successful
> > > full /
> > > > > >> incremental backups and completion time of backup should be used
> > > > > instead of
> > > > > >> start time of backup for ordering. I have not worked on the full
> > > > > redesign,
> > > > > >> and will be doing so if this proposal seems acceptable for the
> > > > > community.
> > > > > >> >
> > > > > >> > Problem 2:
> > > > > >> >
> > > > > >> > With one backup at a time, it fails easily for a multi-tenant
> > > > system.
> > > > > >> This poses following problems
> > > > > >> >
> > > > > >> > Admins will not be able to achieve required RPO's for their
> > tables
> > > > > >> because of dependence on other tenants 

Re: [DISCUSS] Hbase Backup design changes

2021-02-08 Thread Mallikarjun
Hi Sean,

I will get back with the design changes and the answers to above questions
in a few days time.

---
Mallikarjun


On Wed, Feb 3, 2021 at 6:44 AM Sean Busbey  wrote:

> Hi Mallikarjun,
>
> Those goals sound worthwhile.
>
> Do you have a flow chart similar to the one you posted for the current
> system but for the proposed solution?
>
> How much will we need to change our existing test coverage to accommodate
> the proposed solution?
>
> How much will we need to update the existing reference guide section?
>
>
> On Sun, Jan 31, 2021, 04:59 Mallikarjun  wrote:
>
> > Bringing up this thread.
> >
> > On Mon, Jan 25, 2021, 3:38 PM Viraj Jasani  wrote:
> >
> > > Thanks, the image is visible now.
> > >
> > > > Since I wanted to open this for discussion, did not consider placing
> it
> > > in
> > > *hbase/dev_support/design-docs*.
> > >
> > > Definitely, only after we come to concrete conclusion with the
> reviewer,
> > we
> > > should open up a PR. Until then this thread is anyways up for
> discussion.
> > >
> > >
> > > On Mon, 25 Jan 2021 at 1:58 PM, Mallikarjun 
> > > wrote:
> > >
> > > > Hope this link works --> https://ibb.co/hYjRpgP
> > > >
> > > > Inline reply
> > > > On Mon, Jan 25, 2021 at 1:16 PM Viraj Jasani 
> > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > Still not available :)
> > > > > The attachments don’t work on mailing lists. You can try uploading
> > the
> > > > > attachment on some public hosting site and provide the url to the
> > same
> > > > > here.
> > > > >
> > > > > Since I am not aware of the contents, I cannot confirm right away
> but
> > > if
> > > > > the reviewer feels we should have the attachment on our github
> repo:
> > > > > hbase/dev-support/design-docs , good to upload the content there
> > later.
> > > > For
> > > > > instance, pdf file can contain existing design and new design
> > diagrams
> > > > and
> > > > > talk about pros and cons etc once we have things finalized.
> > > > >
> > > > >
> > > > Since I wanted to open this for discussion, did not consider placing
> it
> > > in
> > > > *hbase/dev_support/design-docs*.
> > > >
> > > >
> > > > >
> > > > > On Mon, 25 Jan 2021 at 12:13 PM, Mallikarjun <
> > mallik.v.ar...@gmail.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > Attached as image. Please let me know if it is availabe now.
> > > > > >
> > > > > > ---
> > > > > > Mallikarjun
> > > > > >
> > > > > >
> > > > > > On Mon, Jan 25, 2021 at 10:32 AM Sean Busbey 
> > > > wrote:
> > > > > >
> > > > > >> Hi!
> > > > > >>
> > > > > >> Thanks for the write up. unfortunately, your image for the
> > existing
> > > > > >> design didn't come through. Could you post it to some host and
> > link
> > > it
> > > > > >> here?
> > > > > >>
> > > > > >> On Sun, Jan 24, 2021 at 3:12 AM Mallikarjun <
> > > mallik.v.ar...@gmail.com
> > > > >
> > > > > >> wrote:
> > > > > >> >
> > > > > >> > Existing Design:
> > > > > >> >
> > > > > >> >
> > > > > >> >
> > > > > >> > Problem 1:
> > > > > >> >
> > > > > >> > With this design, Incremental and Full backup can't be run in
> > > > parallel
> > > > > >> and leading to degraded RPO's in case Full backup is of longer
> > > > duration
> > > > > esp
> > > > > >> for large tables.
> > > > > >> >
> > > > > >> > Example:
> > > > > >> > Expectation: Say you have a big table with 10 TB and your RPO
> is
> > > 60
> > > > > >> minutes and you are allowed to ship the remote backup with 800
> > Mbps.
> > > > And
> > > > > >> you are allowed to take Full Backups once in a week and rest of
> > them
> > > > > should
> > > > > >> be incremental backups
> > > > > >> >
> > > > > >> > Shortcoming: With the above design, one can't run parallel
> > backups
> > > > and
> > > > > >> whenever there is a full backup running (which takes roughly 25
> > > hours)
> > > > > you
> > > > > >> are not allowed to take incremental backups and that would be a
> > > breach
> > > > > in
> > > > > >> your RPO.
> > > > > >> >
> > > > > >> > Proposed Solution: Barring some critical sections such as
> > > modifying
> > > > > >> state of the backup on meta tables, others can happen
> parallelly.
> > > > > Leaving
> > > > > >> incremental backups to be able to run based on older successful
> > > full /
> > > > > >> incremental backups and completion time of backup should be used
> > > > > instead of
> > > > > >> start time of backup for ordering. I have not worked on the full
> > > > > redesign,
> > > > > >> and will be doing so if this proposal seems acceptable for the
> > > > > community.
> > > > > >> >
> > > > > >> > Problem 2:
> > > > > >> >
> > > > > >> > With one backup at a time, it fails easily for a multi-tenant
> > > > system.
> > > > > >> This poses following problems
> > > > > >> >
> > > > > >> > Admins will not be able to achieve required RPO's for their
> > tables
> > > > > >> because of dependence on other tenants present in the system. As
> > one
> > > > > tenant
> > > > > >> doesn't have control over other tenants' table sizes and hence
> the
> > > > > 

Re: [DISCUSS] Hbase Backup design changes

2021-02-02 Thread Sean Busbey
Hi Mallikarjun,

Those goals sound worthwhile.

Do you have a flow chart similar to the one you posted for the current
system but for the proposed solution?

How much will we need to change our existing test coverage to accommodate
the proposed solution?

How much will we need to update the existing reference guide section?


On Sun, Jan 31, 2021, 04:59 Mallikarjun  wrote:

> Bringing up this thread.
>
> On Mon, Jan 25, 2021, 3:38 PM Viraj Jasani  wrote:
>
> > Thanks, the image is visible now.
> >
> > > Since I wanted to open this for discussion, did not consider placing it
> > in
> > *hbase/dev_support/design-docs*.
> >
> > Definitely, only after we come to concrete conclusion with the reviewer,
> we
> > should open up a PR. Until then this thread is anyways up for discussion.
> >
> >
> > On Mon, 25 Jan 2021 at 1:58 PM, Mallikarjun 
> > wrote:
> >
> > > Hope this link works --> https://ibb.co/hYjRpgP
> > >
> > > Inline reply
> > > On Mon, Jan 25, 2021 at 1:16 PM Viraj Jasani 
> wrote:
> > >
> > > > Hi,
> > > >
> > > > Still not available :)
> > > > The attachments don’t work on mailing lists. You can try uploading
> the
> > > > attachment on some public hosting site and provide the url to the
> same
> > > > here.
> > > >
> > > > Since I am not aware of the contents, I cannot confirm right away but
> > if
> > > > the reviewer feels we should have the attachment on our github repo:
> > > > hbase/dev-support/design-docs , good to upload the content there
> later.
> > > For
> > > > instance, pdf file can contain existing design and new design
> diagrams
> > > and
> > > > talk about pros and cons etc once we have things finalized.
> > > >
> > > >
> > > Since I wanted to open this for discussion, did not consider placing it
> > in
> > > *hbase/dev_support/design-docs*.
> > >
> > >
> > > >
> > > > On Mon, 25 Jan 2021 at 12:13 PM, Mallikarjun <
> mallik.v.ar...@gmail.com
> > >
> > > > wrote:
> > > >
> > > > > Attached as image. Please let me know if it is availabe now.
> > > > >
> > > > > ---
> > > > > Mallikarjun
> > > > >
> > > > >
> > > > > On Mon, Jan 25, 2021 at 10:32 AM Sean Busbey 
> > > wrote:
> > > > >
> > > > >> Hi!
> > > > >>
> > > > >> Thanks for the write up. unfortunately, your image for the
> existing
> > > > >> design didn't come through. Could you post it to some host and
> link
> > it
> > > > >> here?
> > > > >>
> > > > >> On Sun, Jan 24, 2021 at 3:12 AM Mallikarjun <
> > mallik.v.ar...@gmail.com
> > > >
> > > > >> wrote:
> > > > >> >
> > > > >> > Existing Design:
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> > Problem 1:
> > > > >> >
> > > > >> > With this design, Incremental and Full backup can't be run in
> > > parallel
> > > > >> and leading to degraded RPO's in case Full backup is of longer
> > > duration
> > > > esp
> > > > >> for large tables.
> > > > >> >
> > > > >> > Example:
> > > > >> > Expectation: Say you have a big table with 10 TB and your RPO is
> > 60
> > > > >> minutes and you are allowed to ship the remote backup with 800
> Mbps.
> > > And
> > > > >> you are allowed to take Full Backups once in a week and rest of
> them
> > > > should
> > > > >> be incremental backups
> > > > >> >
> > > > >> > Shortcoming: With the above design, one can't run parallel
> backups
> > > and
> > > > >> whenever there is a full backup running (which takes roughly 25
> > hours)
> > > > you
> > > > >> are not allowed to take incremental backups and that would be a
> > breach
> > > > in
> > > > >> your RPO.
> > > > >> >
> > > > >> > Proposed Solution: Barring some critical sections such as
> > modifying
> > > > >> state of the backup on meta tables, others can happen parallelly.
> > > > Leaving
> > > > >> incremental backups to be able to run based on older successful
> > full /
> > > > >> incremental backups and completion time of backup should be used
> > > > instead of
> > > > >> start time of backup for ordering. I have not worked on the full
> > > > redesign,
> > > > >> and will be doing so if this proposal seems acceptable for the
> > > > community.
> > > > >> >
> > > > >> > Problem 2:
> > > > >> >
> > > > >> > With one backup at a time, it fails easily for a multi-tenant
> > > system.
> > > > >> This poses following problems
> > > > >> >
> > > > >> > Admins will not be able to achieve required RPO's for their
> tables
> > > > >> because of dependence on other tenants present in the system. As
> one
> > > > tenant
> > > > >> doesn't have control over other tenants' table sizes and hence the
> > > > duration
> > > > >> of the backup
> > > > >> > Management overhead of setting up a right sequence to achieve
> > > required
> > > > >> RPO's for different tenants could be very hard.
> > > > >> >
> > > > >> > Proposed Solution: Same as previous proposal
> > > > >> >
> > > > >> > Problem 3:
> > > > >> >
> > > > >> > Incremental backup works on WAL's and
> > > > >> org.apache.hadoop.hbase.backup.master.BackupLogCleaner ensures
> that
> > > > WAL's
> > > > >> are never cleaned up until the next 

Re: [DISCUSS] Hbase Backup design changes

2021-01-31 Thread Mallikarjun
Bringing up this thread.

On Mon, Jan 25, 2021, 3:38 PM Viraj Jasani  wrote:

> Thanks, the image is visible now.
>
> > Since I wanted to open this for discussion, did not consider placing it
> in
> *hbase/dev_support/design-docs*.
>
> Definitely, only after we come to concrete conclusion with the reviewer, we
> should open up a PR. Until then this thread is anyways up for discussion.
>
>
> On Mon, 25 Jan 2021 at 1:58 PM, Mallikarjun 
> wrote:
>
> > Hope this link works --> https://ibb.co/hYjRpgP
> >
> > Inline reply
> > On Mon, Jan 25, 2021 at 1:16 PM Viraj Jasani  wrote:
> >
> > > Hi,
> > >
> > > Still not available :)
> > > The attachments don’t work on mailing lists. You can try uploading the
> > > attachment on some public hosting site and provide the url to the same
> > > here.
> > >
> > > Since I am not aware of the contents, I cannot confirm right away but
> if
> > > the reviewer feels we should have the attachment on our github repo:
> > > hbase/dev-support/design-docs , good to upload the content there later.
> > For
> > > instance, pdf file can contain existing design and new design diagrams
> > and
> > > talk about pros and cons etc once we have things finalized.
> > >
> > >
> > Since I wanted to open this for discussion, did not consider placing it
> in
> > *hbase/dev_support/design-docs*.
> >
> >
> > >
> > > On Mon, 25 Jan 2021 at 12:13 PM, Mallikarjun  >
> > > wrote:
> > >
> > > > Attached as image. Please let me know if it is availabe now.
> > > >
> > > > ---
> > > > Mallikarjun
> > > >
> > > >
> > > > On Mon, Jan 25, 2021 at 10:32 AM Sean Busbey 
> > wrote:
> > > >
> > > >> Hi!
> > > >>
> > > >> Thanks for the write up. unfortunately, your image for the existing
> > > >> design didn't come through. Could you post it to some host and link
> it
> > > >> here?
> > > >>
> > > >> On Sun, Jan 24, 2021 at 3:12 AM Mallikarjun <
> mallik.v.ar...@gmail.com
> > >
> > > >> wrote:
> > > >> >
> > > >> > Existing Design:
> > > >> >
> > > >> >
> > > >> >
> > > >> > Problem 1:
> > > >> >
> > > >> > With this design, Incremental and Full backup can't be run in
> > parallel
> > > >> and leading to degraded RPO's in case Full backup is of longer
> > duration
> > > esp
> > > >> for large tables.
> > > >> >
> > > >> > Example:
> > > >> > Expectation: Say you have a big table with 10 TB and your RPO is
> 60
> > > >> minutes and you are allowed to ship the remote backup with 800 Mbps.
> > And
> > > >> you are allowed to take Full Backups once in a week and rest of them
> > > should
> > > >> be incremental backups
> > > >> >
> > > >> > Shortcoming: With the above design, one can't run parallel backups
> > and
> > > >> whenever there is a full backup running (which takes roughly 25
> hours)
> > > you
> > > >> are not allowed to take incremental backups and that would be a
> breach
> > > in
> > > >> your RPO.
> > > >> >
> > > >> > Proposed Solution: Barring some critical sections such as
> modifying
> > > >> state of the backup on meta tables, others can happen parallelly.
> > > Leaving
> > > >> incremental backups to be able to run based on older successful
> full /
> > > >> incremental backups and completion time of backup should be used
> > > instead of
> > > >> start time of backup for ordering. I have not worked on the full
> > > redesign,
> > > >> and will be doing so if this proposal seems acceptable for the
> > > community.
> > > >> >
> > > >> > Problem 2:
> > > >> >
> > > >> > With one backup at a time, it fails easily for a multi-tenant
> > system.
> > > >> This poses following problems
> > > >> >
> > > >> > Admins will not be able to achieve required RPO's for their tables
> > > >> because of dependence on other tenants present in the system. As one
> > > tenant
> > > >> doesn't have control over other tenants' table sizes and hence the
> > > duration
> > > >> of the backup
> > > >> > Management overhead of setting up a right sequence to achieve
> > required
> > > >> RPO's for different tenants could be very hard.
> > > >> >
> > > >> > Proposed Solution: Same as previous proposal
> > > >> >
> > > >> > Problem 3:
> > > >> >
> > > >> > Incremental backup works on WAL's and
> > > >> org.apache.hadoop.hbase.backup.master.BackupLogCleaner ensures that
> > > WAL's
> > > >> are never cleaned up until the next backup (Full / Incremental) is
> > > taken.
> > > >> This poses following problem
> > > >> >
> > > >> > WAL's can grow unbounded in case there are transient problems like
> > > >> backup site facing issues or anything else until next backup
> scheduled
> > > goes
> > > >> successful
> > > >> >
> > > >> > Proposed Solution: I can't think of anything better, but I see
> this
> > > can
> > > >> be a potential problem. Also, one can force full backup if required
> > WAL
> > > >> files are missing for whatever other reasons not necessarily
> mentioned
> > > >> above.
> > > >> >
> > > >> > ---
> > > >> > Mallikarjun
> > > >>
> > > >
> > >
> >
>


Re: [DISCUSS] Hbase Backup design changes

2021-01-25 Thread Viraj Jasani
Thanks, the image is visible now.

> Since I wanted to open this for discussion, did not consider placing it in
*hbase/dev_support/design-docs*.

Definitely, only after we come to concrete conclusion with the reviewer, we
should open up a PR. Until then this thread is anyways up for discussion.


On Mon, 25 Jan 2021 at 1:58 PM, Mallikarjun 
wrote:

> Hope this link works --> https://ibb.co/hYjRpgP
>
> Inline reply
> On Mon, Jan 25, 2021 at 1:16 PM Viraj Jasani  wrote:
>
> > Hi,
> >
> > Still not available :)
> > The attachments don’t work on mailing lists. You can try uploading the
> > attachment on some public hosting site and provide the url to the same
> > here.
> >
> > Since I am not aware of the contents, I cannot confirm right away but if
> > the reviewer feels we should have the attachment on our github repo:
> > hbase/dev-support/design-docs , good to upload the content there later.
> For
> > instance, pdf file can contain existing design and new design diagrams
> and
> > talk about pros and cons etc once we have things finalized.
> >
> >
> Since I wanted to open this for discussion, did not consider placing it in
> *hbase/dev_support/design-docs*.
>
>
> >
> > On Mon, 25 Jan 2021 at 12:13 PM, Mallikarjun 
> > wrote:
> >
> > > Attached as image. Please let me know if it is availabe now.
> > >
> > > ---
> > > Mallikarjun
> > >
> > >
> > > On Mon, Jan 25, 2021 at 10:32 AM Sean Busbey 
> wrote:
> > >
> > >> Hi!
> > >>
> > >> Thanks for the write up. unfortunately, your image for the existing
> > >> design didn't come through. Could you post it to some host and link it
> > >> here?
> > >>
> > >> On Sun, Jan 24, 2021 at 3:12 AM Mallikarjun  >
> > >> wrote:
> > >> >
> > >> > Existing Design:
> > >> >
> > >> >
> > >> >
> > >> > Problem 1:
> > >> >
> > >> > With this design, Incremental and Full backup can't be run in
> parallel
> > >> and leading to degraded RPO's in case Full backup is of longer
> duration
> > esp
> > >> for large tables.
> > >> >
> > >> > Example:
> > >> > Expectation: Say you have a big table with 10 TB and your RPO is 60
> > >> minutes and you are allowed to ship the remote backup with 800 Mbps.
> And
> > >> you are allowed to take Full Backups once in a week and rest of them
> > should
> > >> be incremental backups
> > >> >
> > >> > Shortcoming: With the above design, one can't run parallel backups
> and
> > >> whenever there is a full backup running (which takes roughly 25 hours)
> > you
> > >> are not allowed to take incremental backups and that would be a breach
> > in
> > >> your RPO.
> > >> >
> > >> > Proposed Solution: Barring some critical sections such as modifying
> > >> state of the backup on meta tables, others can happen parallelly.
> > Leaving
> > >> incremental backups to be able to run based on older successful full /
> > >> incremental backups and completion time of backup should be used
> > instead of
> > >> start time of backup for ordering. I have not worked on the full
> > redesign,
> > >> and will be doing so if this proposal seems acceptable for the
> > community.
> > >> >
> > >> > Problem 2:
> > >> >
> > >> > With one backup at a time, it fails easily for a multi-tenant
> system.
> > >> This poses following problems
> > >> >
> > >> > Admins will not be able to achieve required RPO's for their tables
> > >> because of dependence on other tenants present in the system. As one
> > tenant
> > >> doesn't have control over other tenants' table sizes and hence the
> > duration
> > >> of the backup
> > >> > Management overhead of setting up a right sequence to achieve
> required
> > >> RPO's for different tenants could be very hard.
> > >> >
> > >> > Proposed Solution: Same as previous proposal
> > >> >
> > >> > Problem 3:
> > >> >
> > >> > Incremental backup works on WAL's and
> > >> org.apache.hadoop.hbase.backup.master.BackupLogCleaner ensures that
> > WAL's
> > >> are never cleaned up until the next backup (Full / Incremental) is
> > taken.
> > >> This poses following problem
> > >> >
> > >> > WAL's can grow unbounded in case there are transient problems like
> > >> backup site facing issues or anything else until next backup scheduled
> > goes
> > >> successful
> > >> >
> > >> > Proposed Solution: I can't think of anything better, but I see this
> > can
> > >> be a potential problem. Also, one can force full backup if required
> WAL
> > >> files are missing for whatever other reasons not necessarily mentioned
> > >> above.
> > >> >
> > >> > ---
> > >> > Mallikarjun
> > >>
> > >
> >
>


Re: [DISCUSS] Hbase Backup design changes

2021-01-25 Thread Mallikarjun
Hope this link works --> https://ibb.co/hYjRpgP

Inline reply
On Mon, Jan 25, 2021 at 1:16 PM Viraj Jasani  wrote:

> Hi,
>
> Still not available :)
> The attachments don’t work on mailing lists. You can try uploading the
> attachment on some public hosting site and provide the url to the same
> here.
>
> Since I am not aware of the contents, I cannot confirm right away but if
> the reviewer feels we should have the attachment on our github repo:
> hbase/dev-support/design-docs , good to upload the content there later. For
> instance, pdf file can contain existing design and new design diagrams and
> talk about pros and cons etc once we have things finalized.
>
>
Since I wanted to open this for discussion, did not consider placing it in
*hbase/dev_support/design-docs*.


>
> On Mon, 25 Jan 2021 at 12:13 PM, Mallikarjun 
> wrote:
>
> > Attached as image. Please let me know if it is availabe now.
> >
> > ---
> > Mallikarjun
> >
> >
> > On Mon, Jan 25, 2021 at 10:32 AM Sean Busbey  wrote:
> >
> >> Hi!
> >>
> >> Thanks for the write up. unfortunately, your image for the existing
> >> design didn't come through. Could you post it to some host and link it
> >> here?
> >>
> >> On Sun, Jan 24, 2021 at 3:12 AM Mallikarjun 
> >> wrote:
> >> >
> >> > Existing Design:
> >> >
> >> >
> >> >
> >> > Problem 1:
> >> >
> >> > With this design, Incremental and Full backup can't be run in parallel
> >> and leading to degraded RPO's in case Full backup is of longer duration
> esp
> >> for large tables.
> >> >
> >> > Example:
> >> > Expectation: Say you have a big table with 10 TB and your RPO is 60
> >> minutes and you are allowed to ship the remote backup with 800 Mbps. And
> >> you are allowed to take Full Backups once in a week and rest of them
> should
> >> be incremental backups
> >> >
> >> > Shortcoming: With the above design, one can't run parallel backups and
> >> whenever there is a full backup running (which takes roughly 25 hours)
> you
> >> are not allowed to take incremental backups and that would be a breach
> in
> >> your RPO.
> >> >
> >> > Proposed Solution: Barring some critical sections such as modifying
> >> state of the backup on meta tables, others can happen parallelly.
> Leaving
> >> incremental backups to be able to run based on older successful full /
> >> incremental backups and completion time of backup should be used
> instead of
> >> start time of backup for ordering. I have not worked on the full
> redesign,
> >> and will be doing so if this proposal seems acceptable for the
> community.
> >> >
> >> > Problem 2:
> >> >
> >> > With one backup at a time, it fails easily for a multi-tenant system.
> >> This poses following problems
> >> >
> >> > Admins will not be able to achieve required RPO's for their tables
> >> because of dependence on other tenants present in the system. As one
> tenant
> >> doesn't have control over other tenants' table sizes and hence the
> duration
> >> of the backup
> >> > Management overhead of setting up a right sequence to achieve required
> >> RPO's for different tenants could be very hard.
> >> >
> >> > Proposed Solution: Same as previous proposal
> >> >
> >> > Problem 3:
> >> >
> >> > Incremental backup works on WAL's and
> >> org.apache.hadoop.hbase.backup.master.BackupLogCleaner ensures that
> WAL's
> >> are never cleaned up until the next backup (Full / Incremental) is
> taken.
> >> This poses following problem
> >> >
> >> > WAL's can grow unbounded in case there are transient problems like
> >> backup site facing issues or anything else until next backup scheduled
> goes
> >> successful
> >> >
> >> > Proposed Solution: I can't think of anything better, but I see this
> can
> >> be a potential problem. Also, one can force full backup if required WAL
> >> files are missing for whatever other reasons not necessarily mentioned
> >> above.
> >> >
> >> > ---
> >> > Mallikarjun
> >>
> >
>


Re: [DISCUSS] Hbase Backup design changes

2021-01-24 Thread Viraj Jasani
Hi,

Still not available :)
The attachments don’t work on mailing lists. You can try uploading the
attachment on some public hosting site and provide the url to the same here.

Since I am not aware of the contents, I cannot confirm right away but if
the reviewer feels we should have the attachment on our github repo:
hbase/dev-support/design-docs , good to upload the content there later. For
instance, pdf file can contain existing design and new design diagrams and
talk about pros and cons etc once we have things finalized.


On Mon, 25 Jan 2021 at 12:13 PM, Mallikarjun 
wrote:

> Attached as image. Please let me know if it is availabe now.
>
> ---
> Mallikarjun
>
>
> On Mon, Jan 25, 2021 at 10:32 AM Sean Busbey  wrote:
>
>> Hi!
>>
>> Thanks for the write up. unfortunately, your image for the existing
>> design didn't come through. Could you post it to some host and link it
>> here?
>>
>> On Sun, Jan 24, 2021 at 3:12 AM Mallikarjun 
>> wrote:
>> >
>> > Existing Design:
>> >
>> >
>> >
>> > Problem 1:
>> >
>> > With this design, Incremental and Full backup can't be run in parallel
>> and leading to degraded RPO's in case Full backup is of longer duration esp
>> for large tables.
>> >
>> > Example:
>> > Expectation: Say you have a big table with 10 TB and your RPO is 60
>> minutes and you are allowed to ship the remote backup with 800 Mbps. And
>> you are allowed to take Full Backups once in a week and rest of them should
>> be incremental backups
>> >
>> > Shortcoming: With the above design, one can't run parallel backups and
>> whenever there is a full backup running (which takes roughly 25 hours) you
>> are not allowed to take incremental backups and that would be a breach in
>> your RPO.
>> >
>> > Proposed Solution: Barring some critical sections such as modifying
>> state of the backup on meta tables, others can happen parallelly. Leaving
>> incremental backups to be able to run based on older successful full /
>> incremental backups and completion time of backup should be used instead of
>> start time of backup for ordering. I have not worked on the full redesign,
>> and will be doing so if this proposal seems acceptable for the community.
>> >
>> > Problem 2:
>> >
>> > With one backup at a time, it fails easily for a multi-tenant system.
>> This poses following problems
>> >
>> > Admins will not be able to achieve required RPO's for their tables
>> because of dependence on other tenants present in the system. As one tenant
>> doesn't have control over other tenants' table sizes and hence the duration
>> of the backup
>> > Management overhead of setting up a right sequence to achieve required
>> RPO's for different tenants could be very hard.
>> >
>> > Proposed Solution: Same as previous proposal
>> >
>> > Problem 3:
>> >
>> > Incremental backup works on WAL's and
>> org.apache.hadoop.hbase.backup.master.BackupLogCleaner ensures that WAL's
>> are never cleaned up until the next backup (Full / Incremental) is taken.
>> This poses following problem
>> >
>> > WAL's can grow unbounded in case there are transient problems like
>> backup site facing issues or anything else until next backup scheduled goes
>> successful
>> >
>> > Proposed Solution: I can't think of anything better, but I see this can
>> be a potential problem. Also, one can force full backup if required WAL
>> files are missing for whatever other reasons not necessarily mentioned
>> above.
>> >
>> > ---
>> > Mallikarjun
>>
>


Re: [DISCUSS] Hbase Backup design changes

2021-01-24 Thread Mallikarjun
Attached as image. Please let me know if it is availabe now.

---
Mallikarjun


On Mon, Jan 25, 2021 at 10:32 AM Sean Busbey  wrote:

> Hi!
>
> Thanks for the write up. unfortunately, your image for the existing
> design didn't come through. Could you post it to some host and link it
> here?
>
> On Sun, Jan 24, 2021 at 3:12 AM Mallikarjun 
> wrote:
> >
> > Existing Design:
> >
> >
> >
> > Problem 1:
> >
> > With this design, Incremental and Full backup can't be run in parallel
> and leading to degraded RPO's in case Full backup is of longer duration esp
> for large tables.
> >
> > Example:
> > Expectation: Say you have a big table with 10 TB and your RPO is 60
> minutes and you are allowed to ship the remote backup with 800 Mbps. And
> you are allowed to take Full Backups once in a week and rest of them should
> be incremental backups
> >
> > Shortcoming: With the above design, one can't run parallel backups and
> whenever there is a full backup running (which takes roughly 25 hours) you
> are not allowed to take incremental backups and that would be a breach in
> your RPO.
> >
> > Proposed Solution: Barring some critical sections such as modifying
> state of the backup on meta tables, others can happen parallelly. Leaving
> incremental backups to be able to run based on older successful full /
> incremental backups and completion time of backup should be used instead of
> start time of backup for ordering. I have not worked on the full redesign,
> and will be doing so if this proposal seems acceptable for the community.
> >
> > Problem 2:
> >
> > With one backup at a time, it fails easily for a multi-tenant system.
> This poses following problems
> >
> > Admins will not be able to achieve required RPO's for their tables
> because of dependence on other tenants present in the system. As one tenant
> doesn't have control over other tenants' table sizes and hence the duration
> of the backup
> > Management overhead of setting up a right sequence to achieve required
> RPO's for different tenants could be very hard.
> >
> > Proposed Solution: Same as previous proposal
> >
> > Problem 3:
> >
> > Incremental backup works on WAL's and
> org.apache.hadoop.hbase.backup.master.BackupLogCleaner ensures that WAL's
> are never cleaned up until the next backup (Full / Incremental) is taken.
> This poses following problem
> >
> > WAL's can grow unbounded in case there are transient problems like
> backup site facing issues or anything else until next backup scheduled goes
> successful
> >
> > Proposed Solution: I can't think of anything better, but I see this can
> be a potential problem. Also, one can force full backup if required WAL
> files are missing for whatever other reasons not necessarily mentioned
> above.
> >
> > ---
> > Mallikarjun
>


Re: [DISCUSS] Hbase Backup design changes

2021-01-24 Thread Sean Busbey
Hi!

Thanks for the write up. unfortunately, your image for the existing
design didn't come through. Could you post it to some host and link it
here?

On Sun, Jan 24, 2021 at 3:12 AM Mallikarjun  wrote:
>
> Existing Design:
>
>
>
> Problem 1:
>
> With this design, Incremental and Full backup can't be run in parallel and 
> leading to degraded RPO's in case Full backup is of longer duration esp for 
> large tables.
>
> Example:
> Expectation: Say you have a big table with 10 TB and your RPO is 60 minutes 
> and you are allowed to ship the remote backup with 800 Mbps. And you are 
> allowed to take Full Backups once in a week and rest of them should be 
> incremental backups
>
> Shortcoming: With the above design, one can't run parallel backups and 
> whenever there is a full backup running (which takes roughly 25 hours) you 
> are not allowed to take incremental backups and that would be a breach in 
> your RPO.
>
> Proposed Solution: Barring some critical sections such as modifying state of 
> the backup on meta tables, others can happen parallelly. Leaving incremental 
> backups to be able to run based on older successful full / incremental 
> backups and completion time of backup should be used instead of start time of 
> backup for ordering. I have not worked on the full redesign, and will be 
> doing so if this proposal seems acceptable for the community.
>
> Problem 2:
>
> With one backup at a time, it fails easily for a multi-tenant system. This 
> poses following problems
>
> Admins will not be able to achieve required RPO's for their tables because of 
> dependence on other tenants present in the system. As one tenant doesn't have 
> control over other tenants' table sizes and hence the duration of the backup
> Management overhead of setting up a right sequence to achieve required RPO's 
> for different tenants could be very hard.
>
> Proposed Solution: Same as previous proposal
>
> Problem 3:
>
> Incremental backup works on WAL's and 
> org.apache.hadoop.hbase.backup.master.BackupLogCleaner ensures that WAL's are 
> never cleaned up until the next backup (Full / Incremental) is taken. This 
> poses following problem
>
> WAL's can grow unbounded in case there are transient problems like backup 
> site facing issues or anything else until next backup scheduled goes 
> successful
>
> Proposed Solution: I can't think of anything better, but I see this can be a 
> potential problem. Also, one can force full backup if required WAL files are 
> missing for whatever other reasons not necessarily mentioned above.
>
> ---
> Mallikarjun


[DISCUSS] Hbase Backup design changes

2021-01-24 Thread Mallikarjun
*Existing Design:*

[image: image.png]

*Problem 1: *

With this design, Incremental and Full backup can't be run in parallel and
leading to degraded RPO's in case Full backup is of longer duration esp for
large tables.

Example:
Expectation: Say you have a big table with 10 TB and your RPO is 60 minutes
and you are allowed to ship the remote backup with 800 Mbps. And you are
allowed to take Full Backups once in a week and rest of them should be
incremental backups

Shortcoming: With the above design, one can't run parallel backups and
whenever there is a full backup running (which takes roughly 25 hours) you
are not allowed to take incremental backups and that would be a breach in
your RPO.

*Proposed Solution: *Barring some critical sections such as modifying state
of the backup on meta tables, others can happen parallelly.
Leaving incremental backups to be able to run based on older successful
full / incremental backups and completion time of backup should be used
instead of start time of backup for ordering. I have not worked on the full
redesign, and will be doing so if this proposal seems acceptable for the
community.

*Problem 2:*

With one backup at a time, it fails easily for a multi-tenant system. This
poses following problems

   - Admins will not be able to achieve required RPO's for their tables
   because of dependence on other tenants present in the system. As one tenant
   doesn't have control over other tenants' table sizes and hence the duration
   of the backup
   - Management overhead of setting up a right sequence to achieve required
   RPO's for different tenants could be very hard.

*Proposed Solution: *Same as previous proposal

*Problem 3: *

Incremental backup works on WAL's and
org.apache.hadoop.hbase.backup.master.BackupLogCleaner ensures that WAL's
are never cleaned up until the next backup (Full / Incremental) is taken.
This poses following problem

   - WAL's can grow unbounded in case there are transient problems like
   backup site facing issues or anything else until next backup scheduled goes
   successful

*Proposed Solution: *I can't think of anything better, but I see this can
be a potential problem. Also, one can force full backup if required WAL
files are missing for whatever other reasons not necessarily mentioned
above.

---
Mallikarjun