Re: Accumulo Cluster Sizing

Josh Elser Fri, 22 May 2015 12:10:41 -0700

Need to factor in the encoding that Accumulo does as well as the type ofcompression algorithm you choose. I think we've seen RFile's encodingshrink some datasets down to 1/10th the original size. I'm not sure ifwe have a general reduction formula for RFile since it depends so muchon your schema.

GZ can shrink stuff pretty well, although snappy tends to be a littlefaster but a little bigger.

You might be able to approximate that for yourself relatively easily ifyou have a sliver of your dataset that you can play with.


Jeremy Kepner wrote:

7TB ->  21TB (Hadoop replication), perhaps larger if you have index tables, ...

1M fetches / day ~ 10M entries / day ~ 1000 entries/sec

Typical Accumulo peak is 100K entries/sec/core so you should be fine on query

How fast do you need to insert the data into Accumulo?

On Fri, May 22, 2015 at 03:46:20PM +0000, Fagan, Michael wrote:

Josh,

Thanks, I would like use my performance requirements to derive my HW
requirements.

For example: assume I have a raw 7TB dataset representing 500 million
records with the expectation of 500K-1000K key fetches a day.

I remember there was a tuning webpage circulating around a several years
back to help figure the HW sizing to meet performance benchmarks.


Regards,
Mike Fagan



On 5/22/15, 8:55 AM, "Josh Elser"<[email protected]>  wrote:

Hi Mike,

We have some info in
http://accumulo.apache.org/1.7/accumulo_user_manual.html#_hardware

What's missing there? Let us know the types of questions you have and we
can expand on the document.

- Josh

Fagan, Michael wrote:

Hi,

Can someone point me to recommendations regarding cluster sizing?

Regards,
Mike Fagan

Re: Accumulo Cluster Sizing

Reply via email to