bq. current MR implementation my OOME if there is too many columns This is related: HBASE-14696 Support setting allowPartialResults in mapreduce Mappers
but it is not in any hbase release yet. FYI On Tue, Dec 1, 2015 at 7:16 AM, Jean-Marc Spaggiari <[email protected] > wrote: > I can not say if you are crazy or not. Only you know ;) > > Now, regarding the number of columns... it depends... > If you want to store 800 000 1MB columns, it's almost 800GB for one region. > Forget that! HBase will not split within a row. So you will kill you RS > with a that big region. But if you want to store 800 000 8 bytes columns, > it's only 6MB per row, which is totally doable in recent HBase versions. > But think about: > - If no consistency constraint, add the CQ (Column Qualifier) as part of > the key to be able to split. > - Regroup some values together if the are accessed together. If you always > ready 10K at a time, just put those 10K together in a single cell. > > Also, keep in mind that current MR implementation my OOME if there is too > many columns... A fix is coming, but is not ready yet. > > Now, regarding column families, use them only if you need them. Very > different access pattern or data format (JPG vs plain text, etc.) can > justify another column family, but most of the time you do all what you > meed with a single one... > > HTH, > > JMS > > 2015-12-01 6:48 GMT-05:00 Marko Dinic <[email protected]>: > > > Hi everyone, > > > > I'm new to HBase and I have a simple question - is 800.000 columns a lot > to > > be stored in a single column family? > > > > This data will be mostly be processed as MR jobs. > > > > My guess is that it is not, since all the values are stored in single > > Region, so there shouldn't be a problem. > > > > Is there any limit to number of columns in a column family? > > > > -- > > Marko Dinic > > >
