Yes, I need all of those ints at the same time. And no, there is no streaming.
I have decided to pack 1024 ints into one cell so that each cell would be of size 4kb. I am already using LZO on my tables. I'll do some experiments once I finish implementing both approach. I'll add a thread about the results when I am done. Thanks for the advice. Ed. 2011/10/7 Jean-Daniel Cryans <[email protected]> > (BCC'd common-user@ since this seems strictly HBase related) > > Interesting question... And you probably need all those ints at the same > time right? No streaming? I'll assume no. > > So the second solution seems better due to the overhead of storing each > cell. Basically, storing one int per cell you would end up storing more > keys > than values (size wise). > > Another thing is that if you pack enough ints together and there's some > sort > of repetition, you might be able to use LZO compression on that table. > > I'd love to hear about your experimentations once you've done them. > > J-D > > On Mon, Oct 3, 2011 at 10:58 PM, edward choi <[email protected]> wrote: > > > Hi, > > > > I have a question regarding the performance and column value size. > > I need to store per row several million integers. ("Several million" is > > important here) > > I was wondering which method would be more beneficial performance wise. > > > > 1) Store each integer to a single column so that when a row is called, > > several million columns will also be called. And the user would map each > > column values to some kind of container (ex: vector, arrayList) > > 2) Store, for example, a thousand integers into a single column (by > > concatenating them) so that when a row is called, only several thousand > > columns will be called along. The user would have to split the column > value > > into 4 bytes and map the split integer to some kind of container (ex: > > vector, arrayList) > > > > I am curious which approach would be better. 1) would call several > millions > > of columns but no additional process is needed. 2) would call only > several > > thousands of columns but additional process is needed. > > Any advice would be appreciated. > > > > Ed > > >
