Ignoring the actual size constraint necessary (I'm not entirely sure how that all adds up; it would be affected by concurrent query load and many other things), placing the large chunk into the Key will affect the size of the index inside of RFile (the file construct actually backing the data in your table). This will increase your access times just to find the offset in the file for the Key you're looking for.

Putting a chunk number in the Key and the actual data in the Value will probably net you much better results. Chunking into 128M should work with a 3G heap; however, I'd err on the cautious side and make many smaller chunks instead of few very large chunks.

On 4/1/13 10:33 AM, David Medinets wrote:
I have a chunk of data (let's say 400M) that I want to store in Accumulo. I can store the chunk in the ColumnFamily or in the Value. Does it make any difference to Accumulo which is used?

My tserver is setup to use -Xmx3g. What is the largest size that seems to work? I have much more that I can allocate.

Or should I focus on breaking the data into smaller pieces ... say 128M each?

Thanks.


Reply via email to