@amit. thanks.
@Dave. Thanks, Yes, No need for reduce here.. just have to put the values
using map.. 

@all .. 
Just in case if i dont use clientside cache, what is the ideal
writeBufferSize ? 

 

amit jaiswal wrote:
> 
> Hi,
> 
> MR would be a better option because it will definitely distribute the disk
> I/O. 
> The default HBase client gives very low write throughput and MR +
> Multithreaded 
> client would be good.
> 
> I guess Facebook has the infrastructure to directly create the HFiles
> required 
> for HBase. (remember something like that in their talk 'HBase at Facebook'
> in 
> Hadoop World). That would be the ideal case for bulk load of any external
> data 
> directly to HBase because it can bypass the entire caching/WAL layer, and
> is 
> also an ideal candidate for an MR job.
> 
> http://vimeo.com/16350544
> 
> -regards
> Amit
> 
> 
> ----- Original Message ----
> From: rajgopalv <[email protected]>
> To: [email protected]
> Sent: Thu, 2 December, 2010 5:59:06 PM
> Subject: Re: Inserting Random Data into HBASE
> 
> 
> @Mike : 
> I am using the client side cache. I collect the puts in an arratylist and
> put it together. using HTable.put(List l);
> 
> @Dave. 
> MR seems to be a good idea. 
> I'm relatively new to HBase, haven't worked in a real world hbase cluster.
> So to begin with, could u recommend me a size of a cluster. ( i'm thinking
> of 5, should i have more ? I'll be using EC2 machines and EBS for
> storage..
> Thats fine right?)  And replication factor 3 will be sufficient enough
> right
> ? 
> 
> @ Alex Baranau. What is a good bufferSize ? I'm using the default.
> 
> @amit. Thanks man. But MR seems to be a better option right? 
> 
> 
> rajgopalv wrote:
>> 
>> Hi, 
>> I have to test hbase as to how long it takes to store 100 Million
>> Records.
>> 
>> So i wrote a simple java code which 
>> 
>> 1 : generates random key and 10 columns per key and random values for the
>> 10 columns.
>> 2 : I make a Put object out of these and store it in arrayList
>> 3 : When arrayList's size reaches 5000 i do table.put(listOfPuts);
>> 4 : repeat until i put 100 million records.
>> 
>> And i run this java program as single threaded java program. 
>> 
>> Am i doing it right? is there any other way of importing large data for
>> testing.? [ for now i'm not considering BULK data import/loadtable.rb
>> etc. 
>> apart from this is there any other way ?] 
>> 
>> 
>> 
> 
> -- 
> View this message in context: 
> http://old.nabble.com/Inserting-Random-Data-into-HBASE-tp30349594p30357933.html
> Sent from the HBase User mailing list archive at Nabble.com.
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Inserting-Random-Data-into-HBASE-tp30349594p30366548.html
Sent from the HBase User mailing list archive at Nabble.com.

Reply via email to