I can not say if you are crazy or not. Only you know ;) Now, regarding the number of columns... it depends... If you want to store 800 000 1MB columns, it's almost 800GB for one region. Forget that! HBase will not split within a row. So you will kill you RS with a that big region. But if you want to store 800 000 8 bytes columns, it's only 6MB per row, which is totally doable in recent HBase versions. But think about: - If no consistency constraint, add the CQ (Column Qualifier) as part of the key to be able to split. - Regroup some values together if the are accessed together. If you always ready 10K at a time, just put those 10K together in a single cell.
Also, keep in mind that current MR implementation my OOME if there is too many columns... A fix is coming, but is not ready yet. Now, regarding column families, use them only if you need them. Very different access pattern or data format (JPG vs plain text, etc.) can justify another column family, but most of the time you do all what you meed with a single one... HTH, JMS 2015-12-01 6:48 GMT-05:00 Marko Dinic <[email protected]>: > Hi everyone, > > I'm new to HBase and I have a simple question - is 800.000 columns a lot to > be stored in a single column family? > > This data will be mostly be processed as MR jobs. > > My guess is that it is not, since all the values are stored in single > Region, so there shouldn't be a problem. > > Is there any limit to number of columns in a column family? > > -- > Marko Dinic >
