@amit. thanks. @Dave. Thanks, Yes, No need for reduce here.. just have to put the values using map..
@all .. Just in case if i dont use clientside cache, what is the ideal writeBufferSize ? amit jaiswal wrote: > > Hi, > > MR would be a better option because it will definitely distribute the disk > I/O. > The default HBase client gives very low write throughput and MR + > Multithreaded > client would be good. > > I guess Facebook has the infrastructure to directly create the HFiles > required > for HBase. (remember something like that in their talk 'HBase at Facebook' > in > Hadoop World). That would be the ideal case for bulk load of any external > data > directly to HBase because it can bypass the entire caching/WAL layer, and > is > also an ideal candidate for an MR job. > > http://vimeo.com/16350544 > > -regards > Amit > > > ----- Original Message ---- > From: rajgopalv <[email protected]> > To: [email protected] > Sent: Thu, 2 December, 2010 5:59:06 PM > Subject: Re: Inserting Random Data into HBASE > > > @Mike : > I am using the client side cache. I collect the puts in an arratylist and > put it together. using HTable.put(List l); > > @Dave. > MR seems to be a good idea. > I'm relatively new to HBase, haven't worked in a real world hbase cluster. > So to begin with, could u recommend me a size of a cluster. ( i'm thinking > of 5, should i have more ? I'll be using EC2 machines and EBS for > storage.. > Thats fine right?) And replication factor 3 will be sufficient enough > right > ? > > @ Alex Baranau. What is a good bufferSize ? I'm using the default. > > @amit. Thanks man. But MR seems to be a better option right? > > > rajgopalv wrote: >> >> Hi, >> I have to test hbase as to how long it takes to store 100 Million >> Records. >> >> So i wrote a simple java code which >> >> 1 : generates random key and 10 columns per key and random values for the >> 10 columns. >> 2 : I make a Put object out of these and store it in arrayList >> 3 : When arrayList's size reaches 5000 i do table.put(listOfPuts); >> 4 : repeat until i put 100 million records. >> >> And i run this java program as single threaded java program. >> >> Am i doing it right? is there any other way of importing large data for >> testing.? [ for now i'm not considering BULK data import/loadtable.rb >> etc. >> apart from this is there any other way ?] >> >> >> > > -- > View this message in context: > http://old.nabble.com/Inserting-Random-Data-into-HBASE-tp30349594p30357933.html > Sent from the HBase User mailing list archive at Nabble.com. > > -- View this message in context: http://old.nabble.com/Inserting-Random-Data-into-HBASE-tp30349594p30366548.html Sent from the HBase User mailing list archive at Nabble.com.
