Re: importing a large table

Rita Sat, 31 Mar 2012 13:27:07 -0700

Heh. Thanks for the links. I already read the Do and Donts :-). The videos
volume is rather low.



I am already using lzo as my compression method. My regions are set to 30GB
in resident memory.




On Sat, Mar 31, 2012 at 1:19 PM, Marcos Ortiz <[email protected]> wrote:

> Well, doing some calculations, you have 18 TB of data, divided in 9200
> regions, you have approximately 2.4 GB by regions. Is this correct?
>
> Well, my first advice is that you have to unable the automatic split
> mechanism in HBase. It better to do this manually, but you will have an
> insane number on regions in short time.
>
> The second is to enable compression (Gzip, LZO, Snappy) in all your HBase
> cluster. This brings to you less data to work, and less network
> overhead.
>
> Omer, one of the Software Engineer at the LA Hadoop User Group gave a
> excellent talk about HBase called: "HBase Do's and Don'ts". I recommend
> that you should see this talk.
>
> See the post first in the Cloudera's blog:
> http://www.cloudera.com/blog/**2011/04/hbase-dos-and-donts/<http://www.cloudera.com/blog/2011/04/hbase-dos-and-donts/>
>
> - Video
> http://www.meetup.com/LA-HUG/**pages/Video_from_April_13th_**
> HBASE_DO%27S_and_DON%27TS/<http://www.meetup.com/LA-HUG/pages/Video_from_April_13th_HBASE_DO%27S_and_DON%27TS/>
>
>
>
> On 3/31/2012 5:33 AM, Rita wrote:
>
>> I have close to 9200 regions. Is there an example I can follow? or are
>> there tools to do this already?
>>
>>
>>
>> On Fri, Mar 30, 2012 at 10:11 AM, Marcos Ortiz <[email protected]
>> <mailto:[email protected]>> wrote:
>>
>>
>>
>>    On 03/30/2012 04:54 AM, Rita wrote:
>>
>>>    Thanks for the responses. I am using 0.90.4-cdh3. i exported the table
>>>    using hbase exporter. Yes, the previous table still exists but on a
>>>    different cluster.My region servers are large, close to 12GB in size.
>>>
>>    Which is the total number of your regions?
>>
>>     I want to understand regarding Hfiles. We export the table as a
>>> series of
>>>    Hfiles and then import them in?
>>>
>>    Yes, The simplest way to do this is using the TableOutputFormat, but
>>    if you use instead the HFileOutputFormat, the process will be more
>>    efficient, because using this feature (bulk loads) will use less CPU
>>    and network. With a MapReduce job, you prepare your data using the
>>    HFileOutputFormat (Hadoop's TotalOrderPartitioner class in used to
>>    partition the map output
>>    into disjoint ranges of the key space, corresponding to the key
>>    ranges of the regions in the table).
>>
>>
>>     What is the difference between that in the
>>>    regular MR export job?
>>>
>>    The main difference with regular MR jobs is the output, instead to
>>    use the classic ouput formats like TextOutputFormat,
>>    MultipleOutputFormat, SequenceFileOutputFormat, etc, you will use
>>    the HFileOutputFormat, that is the native data file type for HBase
>>    (HFile).
>>
>>       I idea sounds good because it sounds simple on the
>>>    surface :-)
>>>
>>
>>
>>>    On Fri, Mar 30, 2012 at 12:08 AM, Stack<[email protected]>  <mailto:
>>> [email protected]>  wrote:
>>>
>>>     On Thu, Mar 29, 2012 at 7:57 PM, Rita<[email protected]>
>>>>  <mailto:[email protected]>  wrote:
>>>>
>>>>     Hello,
>>>>>
>>>>>    I am importing a 40+ billion row table which I exported several
>>>>> months
>>>>>
>>>>    ago.
>>>>
>>>>>    The data size is close to 18TB on hdfs (3x replication).
>>>>>
>>>>>     Does the table from back then still exist?  Or do you remember what
>>>>    the key spread was like?  Could you precreate the old table?
>>>>
>>>>     My problem is when I try to import it with mapreduce it takes a few
>>>>> days
>>>>>
>>>>    --
>>>>
>>>>>    which is ok -- however when the job fails to whatever reason, I
>>>>> have to
>>>>>    restart everything. Is it possible to import the table in chunks
>>>>> like,
>>>>>    import 1/3, 2/3, and then finally 3/3  of the table?
>>>>>
>>>>>     Yeah.  Funny how the plug gets pulled on the rack when the three
>>>> day
>>>>    job is at the end 95% done.
>>>>
>>>>     Btw, the jobs creates close to 150k mapper jobs, thats a problem
>>>>> waiting
>>>>>
>>>>    to
>>>>
>>>>>    happen :-)
>>>>>
>>>>>     Are you running 0.92?  If not, you should and go for bigger
>>>> regions.   10G?
>>>>
>>>>    St.Ack
>>>>
>>>>
>>    --
>>    Marcos Luis Ortíz Valmaseda (@marcosluis2186)
>>      Data Engineer at UCI
>>      
>> http://marcosluis2186.**posterous.com<http://marcosluis2186.posterous.com>
>>
>>
>>    <http://www.uci.cu/>
>>
>>
>>
>>
>>
>> --
>> --- Get your facts first, then you can distort them as you please.--
>>
>
> --
> Marcos Luis Ortíz Valmaseda (@marcosluis2186)
>  Data Engineer at UCI
>  http://marcosluis2186.**posterous.com<http://marcosluis2186.posterous.com>
>
> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
> INFORMATICAS...
> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>
> http://www.uci.cu
> http://www.facebook.com/**universidad.uci<http://www.facebook.com/universidad.uci>
> http://www.flickr.com/photos/**universidad_uci<http://www.flickr.com/photos/universidad_uci>
>



-- 
--- Get your facts first, then you can distort them as you please.--

Re: importing a large table

Reply via email to