Re: importing a large table

Marcos Ortiz Sat, 31 Mar 2012 10:20:24 -0700

Well, doing some calculations, you have 18 TB of data, divided in 9200regions, you have approximately 2.4 GB by regions. Is this correct?

Well, my first advice is that you have to unable the automatic splitmechanism in HBase. It better to do this manually, but you will have an

insane number on regions in short time.

The second is to enable compression (Gzip, LZO, Snappy) in all yourHBase cluster. This brings to you less data to work, and less network

overhead.

Omer, one of the Software Engineer at the LA Hadoop User Group gave aexcellent talk about HBase called: "HBase Do's and Don'ts". I recommendthat you should see this talk.


See the post first in the Cloudera's blog:
http://www.cloudera.com/blog/2011/04/hbase-dos-and-donts/

- Video
http://www.meetup.com/LA-HUG/pages/Video_from_April_13th_HBASE_DO%27S_and_DON%27TS/


On 3/31/2012 5:33 AM, Rita wrote:

I have close to 9200 regions. Is there an example I can follow? or are
there tools to do this already?



On Fri, Mar 30, 2012 at 10:11 AM, Marcos Ortiz <[email protected]
<mailto:[email protected]>> wrote:



    On 03/30/2012 04:54 AM, Rita wrote:

    Thanks for the responses. I am using 0.90.4-cdh3. i exported the table
    using hbase exporter. Yes, the previous table still exists but on a
    different cluster.My region servers are large, close to 12GB in size.

    Which is the total number of your regions?

    I want to understand regarding Hfiles. We export the table as a series of
    Hfiles and then import them in?

    Yes, The simplest way to do this is using the TableOutputFormat, but
    if you use instead the HFileOutputFormat, the process will be more
    efficient, because using this feature (bulk loads) will use less CPU
    and network. With a MapReduce job, you prepare your data using the
    HFileOutputFormat (Hadoop's TotalOrderPartitioner class in used to
    partition the map output
    into disjoint ranges of the key space, corresponding to the key
    ranges of the regions in the table).

    What is the difference between that in the
    regular MR export job?

    The main difference with regular MR jobs is the output, instead to
    use the classic ouput formats like TextOutputFormat,
    MultipleOutputFormat, SequenceFileOutputFormat, etc, you will use
    the HFileOutputFormat, that is the native data file type for HBase
    (HFile).

      I idea sounds good because it sounds simple on the
    surface :-)


    On Fri, Mar 30, 2012 at 12:08 AM, Stack<[email protected]>  
<mailto:[email protected]>  wrote:

    On Thu, Mar 29, 2012 at 7:57 PM, Rita<[email protected]>  
<mailto:[email protected]>  wrote:

    Hello,

    I am importing a 40+ billion row table which I exported several months

    ago.

    The data size is close to 18TB on hdfs (3x replication).

    Does the table from back then still exist?  Or do you remember what
    the key spread was like?  Could you precreate the old table?

    My problem is when I try to import it with mapreduce it takes a few days

--

    which is ok -- however when the job fails to whatever reason, I have to
    restart everything. Is it possible to import the table in chunks like,
    import 1/3, 2/3, and then finally 3/3  of the table?

    Yeah.  Funny how the plug gets pulled on the rack when the three day
    job is at the end 95% done.

    Btw, the jobs creates close to 150k mapper jobs, thats a problem waiting

to

    happen :-)

    Are you running 0.92?  If not, you should and go for bigger regions.   10G?

    St.Ack


    --
    Marcos Luis Ortíz Valmaseda (@marcosluis2186)
      Data Engineer at UCI
      http://marcosluis2186.posterous.com


    <http://www.uci.cu/>




--
--- Get your facts first, then you can distort them as you please.--


--
Marcos Luis Ortíz Valmaseda (@marcosluis2186)
 Data Engineer at UCI
 http://marcosluis2186.posterous.com

10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

Re: importing a large table

Reply via email to