I can not say if you are crazy or not. Only you know ;)

Now, regarding the number of columns... it depends...
If you want to store 800 000 1MB columns, it's almost 800GB for one region.
Forget that! HBase will not split within a row. So you will kill you RS
with a that big region. But if you want to store 800 000 8 bytes columns,
it's only 6MB per row, which is totally doable in recent HBase versions.
But think about:
- If no consistency constraint, add the CQ (Column Qualifier) as part of
the key to be able to split.
- Regroup some values together if the are accessed together. If you always
ready 10K at a time, just put those 10K together in a single cell.

Also, keep in mind that current MR implementation my OOME if there is too
many columns... A fix is coming, but is not ready yet.

Now, regarding column families, use them only if you need them. Very
different access pattern or data format (JPG vs plain text, etc.) can
justify another column family, but most of the time you do all what you
meed with a single one...

HTH,

JMS

2015-12-01 6:48 GMT-05:00 Marko Dinic <[email protected]>:

> Hi everyone,
>
> I'm new to HBase and I have a simple question - is 800.000 columns a lot to
> be stored in a single column family?
>
> This data will be mostly be processed as MR jobs.
>
> My guess is that it is not, since all the values are stored in single
> Region, so there shouldn't be a problem.
>
> Is there any limit to number of columns in a column family?
>
> --
> Marko Dinic
>

Reply via email to