I have close to 9200 regions. Is there an example I can follow? or are there tools to do this already?
On Fri, Mar 30, 2012 at 10:11 AM, Marcos Ortiz <[email protected]> wrote: > > > On 03/30/2012 04:54 AM, Rita wrote: > > Thanks for the responses. I am using 0.90.4-cdh3. i exported the table > using hbase exporter. Yes, the previous table still exists but on a > different cluster.My region servers are large, close to 12GB in size. > > Which is the total number of your regions? > > I want to understand regarding Hfiles. We export the table as a series of > Hfiles and then import them in? > > Yes, The simplest way to do this is using the TableOutputFormat, but if > you use instead the HFileOutputFormat, the process will be more efficient, > because using this feature (bulk loads) will use less CPU and network. With > a MapReduce job, you prepare your data using the HFileOutputFormat > (Hadoop's TotalOrderPartitioner class in used to partition the map output > into disjoint ranges of the key space, corresponding to the key ranges of > the regions in the table). > > > What is the difference between that in the > regular MR export job? > > The main difference with regular MR jobs is the output, instead to use > the classic ouput formats like TextOutputFormat, MultipleOutputFormat, > SequenceFileOutputFormat, etc, you will use the HFileOutputFormat, that is > the native data file type for HBase (HFile). > > I idea sounds good because it sounds simple on the > surface :-) > > > > > On Fri, Mar 30, 2012 at 12:08 AM, Stack <[email protected]> <[email protected]> > wrote: > > > On Thu, Mar 29, 2012 at 7:57 PM, Rita <[email protected]> > <[email protected]> wrote: > > Hello, > > I am importing a 40+ billion row table which I exported several months > > ago. > > The data size is close to 18TB on hdfs (3x replication). > > > Does the table from back then still exist? Or do you remember what > the key spread was like? Could you precreate the old table? > > > My problem is when I try to import it with mapreduce it takes a few days > > -- > > which is ok -- however when the job fails to whatever reason, I have to > restart everything. Is it possible to import the table in chunks like, > import 1/3, 2/3, and then finally 3/3 of the table? > > > Yeah. Funny how the plug gets pulled on the rack when the three day > job is at the end 95% done. > > > Btw, the jobs creates close to 150k mapper jobs, thats a problem waiting > > to > > happen :-) > > > Are you running 0.92? If not, you should and go for bigger regions. 10G? > > St.Ack > > > > -- > Marcos Luis OrtÃz Valmaseda (@marcosluis2186) > Data Engineer at UCI > http://marcosluis2186.posterous.com > > > <http://www.uci.cu/> > > -- --- Get your facts first, then you can distort them as you please.--
