Re: Should Taking A Snapshot Work Even If Balancer Is Moving A Few Regions Around?

2018-03-23 Thread Saad Mufti
Thanks.


Saad


On Wed, Mar 21, 2018 at 3:04 PM, Ted Yu  wrote:

> Looking at
> hbase-client/src/main/java/org/apache/hadoop/hbase/client/Admin.java in
> branch-1.4 :
>
>   boolean[] setSplitOrMergeEnabled(final boolean enabled, final boolean
> synchronous,
>final MasterSwitchType... switchTypes)
> throws IOException;
>
>   boolean isSplitOrMergeEnabled(final MasterSwitchType switchType) throws
> IOException;
>
> Please also see the following script:
>
> hbase-shell/src/main/ruby/shell/commands/splitormerge_switch.rb
>
> FYI
>
> On Wed, Mar 21, 2018 at 11:33 AM, Vladimir Rodionov <
> vladrodio...@gmail.com>
> wrote:
>
> > >>So my question is whether taking a snapshot is supposed to work even
> with
> > >>regions being moved around. In our case it is usually only a couple
> here
> > >>and there.
> >
> > No, if region was moved, split or merged during snapshot operation -
> > snapshot will fail.
> > This is why taking snapshots on a large table is a 50/50 game.
> >
> > Disabling balancer,region merging and split before snapshot should help.
> > This works in 2.0
> >
> > Not sure if merge/split switch is available in 1.4
> >
> > -Vlad
> >
> > On Tue, Mar 20, 2018 at 8:00 PM, Saad Mufti 
> wrote:
> >
> > > Hi,
> > >
> > > We are using HBase 1.4.0 on AWS EMR based Hbase. Since snapshots are in
> > S3,
> > > they take much longer than when using local disk. We have a cron script
> > to
> > > take regular snapshots as backup, and they fail quite often on our
> > largest
> > > table which takes close to an hour to complete the snapshot.
> > >
> > > The only thing I have noticed in the errors usually is a message about
> > the
> > > region moving or closing.
> > >
> > > So my question is whether taking a snapshot is supposed to work even
> with
> > > regions being moved around. In our case it is usually only a couple
> here
> > > and there.
> > >
> > > Thanks.
> > >
> > > 
> > > Saad
> > >
> >
>


Re: Should Taking A Snapshot Work Even If Balancer Is Moving A Few Regions Around?

2018-03-23 Thread Saad Mufti
Thanks.


Saad

On Wed, Mar 21, 2018 at 2:33 PM, Vladimir Rodionov 
wrote:

> >>So my question is whether taking a snapshot is supposed to work even with
> >>regions being moved around. In our case it is usually only a couple here
> >>and there.
>
> No, if region was moved, split or merged during snapshot operation -
> snapshot will fail.
> This is why taking snapshots on a large table is a 50/50 game.
>
> Disabling balancer,region merging and split before snapshot should help.
> This works in 2.0
>
> Not sure if merge/split switch is available in 1.4
>
> -Vlad
>
> On Tue, Mar 20, 2018 at 8:00 PM, Saad Mufti  wrote:
>
> > Hi,
> >
> > We are using HBase 1.4.0 on AWS EMR based Hbase. Since snapshots are in
> S3,
> > they take much longer than when using local disk. We have a cron script
> to
> > take regular snapshots as backup, and they fail quite often on our
> largest
> > table which takes close to an hour to complete the snapshot.
> >
> > The only thing I have noticed in the errors usually is a message about
> the
> > region moving or closing.
> >
> > So my question is whether taking a snapshot is supposed to work even with
> > regions being moved around. In our case it is usually only a couple here
> > and there.
> >
> > Thanks.
> >
> > 
> > Saad
> >
>


Anyone Have A Workaround For HBASE-19681?

2018-03-23 Thread Saad Mufti
We are facing the exact same symptoms in HBase 1.4.0 running on AWS EMR
based cluster, and desperately need to take a snapshot to feed a downstream
job. So far we have tried using the "assign" command on all regions
involved to move them around but the snapshot still fails. Also saw the
same error earlier in a compaction thread on the same missing file.

Is there anyway we can recover this db? We ran hbck -details and it
reported no errors.

Thanks.


Saad


ASYNC_WAL and visibility

2018-03-23 Thread Viacheslav Krot
Hi all,
I cannot find any good information about ASYNC_WAL semantics.
As much as I understand the worst I can get - data loss in case of region
failure.
But the question I have - can it affect visibility in any way? Does it
still follow the principle "When a client receives a "success" response for
any mutation, that mutation is immediately visible to both that client and
any client with whom it later communicates through side channels" ?
Another question - is data loss possible in any situation other than region
failure?


ASYNC_WAL and visibility

2018-03-23 Thread Viacheslav Krot
Hi all,
I cannot find any good information about ASYNC_WAL semantics.
As much as I understand the worst I can get - data loss in case of region
failure.
But the question I have - can it affect visibility in any way? Does it
still follow the principle "When a client receives a "success" response for
any mutation, that mutation is immediately visible to both that client and
any client with whom it later communicates through side channels" ?
Another question - is data loss possible in any situation other than region
failure?