Re: HBase - bulk loading files

Stack Sat, 10 Jan 2015 17:43:22 -0800

On Sat, Jan 10, 2015 at 9:08 AM, Rama Ramani <[email protected]> wrote:


> I am looking for a way to avoid the regionserver hotspotting while doing a
> bulk load. My input files to ImportTsv are extracted from a relational
> store and have monotonically increasing Ids.
>
>
> Alternatively, is there a way for ImportTsv to generate its own row key
> (which does not increase monotically) and load the column data from the
> input files? If  there are no options to bulk load using this tool and
> spread the load, then I will just write code to generate the rowkey and use
> the HBase API for loading. Just wanted to confirm with the experts from
> this DL
>
>
> Thanks
>
>
>
You could write out hfiles and do a bulk import of these? See
http://hbase.apache.org/book.html#d0e8022 The writing of the hfiles will
not suffer 'hotspotting'.

Else, subclass TsvImporterMapper map function, doctor the RDBMS seqid key
by adding a prefix ('salting') or reversing or hashing, etc., and then
specify your customization as the mapper for ImportTsv to use.

St.Ack



>
>
>
>
>
>
>
>
>
> From: Ted Yu
> Sent: ‎Friday‎, ‎January‎ ‎9‎, ‎2015 ‎2‎:‎14‎ ‎PM
> To: [email protected]
>
>
>
>
>
> Salted buckets seem to be concept from other projects, such as Phoenix.
>
> Can you be a bit more specific about your requirement ?
>
> Cheers
>
> On Fri, Jan 9, 2015 at 12:53 PM, Rama Ramani <[email protected]> wrote:
>
> > Is there a way to specify Salted buckets with HBase ImportTsv while doing
> > bulk load?
> >
> > Thanks
> > Rama
> >
> > From: [email protected]
> > To: [email protected]
> > Subject: RE: HBase - bulk loading files
> > Date: Fri, 19 Dec 2014 14:09:09 -0800
> >
> >
> >
> >
> > 0.98.0.2.1.9.0-2196-hadoop2Hadoop 2.4.0.2.1.9.0-2196Subversion
> > [email protected]:hortonworks/hadoop-monarch.git -r cb50542bc92fb77dee52
> > No, the clusters were not taking additional load.
> > ThanksRama
> > > Date: Fri, 19 Dec 2014 13:50:30 -0800
> > > Subject: Re: HBase - bulk loading files
> > > From: [email protected]
> > > To: [email protected]
> > >
> > > Can you let us know the HBase and hadoop versions you're using ?
> > >
> > > Were the clusters taking load from other sources when ImportTsv was
> > running
> > > ?
> > >
> > > Cheers
> > >
> > > On Fri, Dec 19, 2014 at 1:43 PM, Rama Ramani <[email protected]>
> > wrote:
> > >
> > > > Hello,         I am bulk loading a set of files (about 400MB each)
> with
> > > > "|" as the delimiter using ImportTsv. It takes a long time for the
> > 'map'
> > > > job to complete on both a 4 node and a 16 node cluster. I tried the
> > option
> > > > to generate the output (providing -Dimporttsv.bulk.output) which took
> > time
> > > > indicating that the generation of the output files needs improvement.
> > > > I am seeing about 8000 rows / sec for this dataset, the 400MB
> ingestion
> > > > takes about 5-6 mins. How can I improve this? Is there an alternate
> > tool I
> > > > can use?
> > > > ThanksRama
> >
> >
> >
>

Re: HBase - bulk loading files

Reply via email to