Pablo, That is correct.
On Mon, Aug 5, 2013 at 10:00 AM, Pablo Medina <[email protected]>wrote: > Lars, > > when you say 'when one memstore needs to be flushed all other column > families are flushed', are you referring to other column families of the > same table, right? > > > > > 2013/8/4 Rohit Kelkar <[email protected]> > > > Regarding slow scan- only fetch the columns /qualifiers that you need. It > > may be that you are fetching a whole lot of data that you don't need. Try > > scan.addColumn() and let us know. > > > > - R > > > > On Sunday, August 4, 2013, lars hofhansl wrote: > > > > > BigTable has one more level of abstraction: Locality Groups > > > A Column Family in HBase is both a Column Faimily and a Locality Group: > > It > > > is a group of columns *and* it defines storage parameters (compression, > > > versions, TTL, etc). > > > > > > As to how many make sense. It depends. > > > If you can group your columns such that a scan is often limited to a > > > single Column Family, you'll get huge benefit by using more Column > > Families. > > > The main consideration for many Column Families and that each has its > own > > > store files, and hence scanning involves more seeking for each Column > > > Families included in a scan. > > > > > > They are also flushed together; when one memstore (which is per Column > > > Family) needs to be flushed all other Column Families are also flushed > > > leading to many small files until they are compacted. If all your > Column > > > Faimilies are roughly the same size this is less of a problem. It's > also > > > possible to mitigate this by tweaking the compaction policies. > > > > > > > > > -- Lars > > > > > > > > > > > > ________________________________ > > > From: Vimal Jain <[email protected] <javascript:;>> > > > To: [email protected] <javascript:;> > > > Sent: Saturday, August 3, 2013 11:28 PM > > > Subject: Re: How many column families in one table ? > > > > > > > > > Hi, > > > I have tested read performance after reducing number of column families > > > from 14 to 3 and yes there is improvement. > > > Meanwhile i was going through the paper published by google on > BigTable. > > > It says > > > > > > "It is our intent that the number of distinct column > > > families in a table be small (in the hundreds at most), and > > > that families rarely change during operation." > > > > > > So Is that theoretical value ( 100 CFs ) or its possible but not with > > the > > > current version of Hbase ? > > > > > > > > > On Tue, Jul 2, 2013 at 12:48 AM, Viral Bajaria < > [email protected] > > <javascript:;> > > > >wrote: > > > > > > > On Mon, Jul 1, 2013 at 10:06 AM, Vimal Jain <[email protected] > > <javascript:;>> > > > wrote: > > > > > > > > > Sorry for the typo .. please ignore previous mail.. Here is the > > > corrected > > > > > one.. > > > > > 1)I have around 140 columns for each row , out of 140 , around 100 > > > > columns > > > > > hold java primitive data type , remaining 40 columns contain > > > serialized > > > > > java object as byte array(Inside each object is an ArrayList). Yes > , > > I > > > do > > > > > delete data but the frequency is very less ( 1 out of 5K operations > > ). > > > I > > > > > dont run any compaction. > > > > > > > > > > > > > This answers the type of data in each cell not the size of data. Can > > you > > > > figure out the average size of data that you insert in that size. For > > > > example what is the length of the byte array ? Also for java > primitive, > > > is > > > > it 8-byte long ? 4-byte int ? > > > > In addition to that, what is in the row key ? How long is that in > > bytes ? > > > > Same for column family, can you share the names of the column family > ? > > > How > > > > about qualifiers ? > > > > > > > > If you have disabled major compactions, you should run it once a few > > days > > > > (if not once a day) to consolidate the # of files that each scan will > > > have > > > > to open. > > > > > > > > 2) I had ran scan keeping in mind the CPU,IO and other system related > > > > > parameters.I found them to be normal with system load being > 0.1-0.3. > > > > > > > > > > > > > How many disks do you have in your box ? Have you ever benchmarked > the > > > > hardware ? > > > > > > > > Thanks, > > > > Viral > > > > > > > > > > > > > > > > -- > > > Thanks and Regards, > > > Vimal Jain > > > -- Kevin O'Dell Systems Engineer, Cloudera
