I do not believe that to be true.
HBase only uses Region boundaries to identify useful scan ranges during the
setup of the job. These ranges will work regardless of whether the number of
regions increases later or not. The worst case is that a single mapper might be
scanning multiple regions (those that are the result of a split of the region
it was supposed to scan).
Regions are unavailable for a short time during a split, but the mappers are
normal HBase clients and so they wait out the splits by retrying.
-- Lars
From: Flavio Pompermaier <[email protected]>
To: [email protected]
Sent: Friday, October 31, 2014 10:23 AM
Subject: Re: Region split during mapreduce
The problem is that I don't know if what they say at that link is true or
not.
In the past I experienced several problems running mapreduce jobs on a
"live" Hbase table but I didn't know about the fact that mapreduce jobs
crash if region were splitting..
Do I have to create a snapshot if I want to use TableSnapshotInputFormat or
it automatically handles the snapshot creation and deletion of a snapshot?
Is there any detailed reference about how to deal with such event during
mapreduce jobs?
Thanks for the support,
Flavio
On Fri, Oct 31, 2014 at 6:12 PM, Ted Yu <[email protected]> wrote:
> Flavio:
> Have you considered using TableSnapshotInputFormat ?
>
> See TableMapReduceUtil#initTableSnapshotMapperJob()
>
> Cheers
>
> On Fri, Oct 31, 2014 at 10:01 AM, Flavio Pompermaier <[email protected]
> >
> wrote:
>
> > Is there anybody here..?
> >
> > On Thu, Oct 30, 2014 at 2:28 PM, Flavio Pompermaier <
> [email protected]>
> > wrote:
> >
> > > Any help about this..?
> > >
> > > On Wed, Oct 29, 2014 at 9:08 AM, Flavio Pompermaier <
> > [email protected]>
> > > wrote:
> > >
> > >> Hi to all,
> > >> I was reading
> > >>
> >
> http://www.abcn.net/2014/07/spark-hbase-result-keyvalue-bytearray.html?m=1
> > >> and they say " still using
> > >> org.apache.hadoop.hbase.mapreduce.TableInputFormat is a big problem,
> > your
> > >> job will fail when one of HBase Region for target HBase table is
> > splitting
> > >> ! because the original region will be offline by splitting".
> > >>
> > >> Is that true?
> > >> Is there a solution to that?
> > >>
> > >> Best,
> > >> Flavio
> > >>
> > >
> >
>