Re: benchmarking

Jeremy Kepner Tue, 28 Aug 2018 16:48:49 -0700

Our nodes are usually 20+ cores and 100+ GB RAM.


On Tue, Aug 28, 2018 at 10:18:24PM +0300, guy sharon wrote:
> hi Jeremy,
> 
> Do you have any information on how you configure them and what kind of
> hardware they run on?
> 
> Thanks,
> Guy.
> 
> 
> 
> On Tue, Aug 28, 2018 at 3:44 PM Jeremy Kepner <[email protected]> wrote:
> 
> > FYI, Single node Accumulo instances is our most popular deployment.
> > We have hundreds of them.   Accummulo is so fast that it can replace
> > what would normally require 20 MySQL servers.
> >
> > Regards.  -Jeremy
> >
> > On Tue, Aug 28, 2018 at 07:38:37AM +0000, Sean Busbey wrote:
> > > Hi Guy,
> > >
> > > Apache Accumulo is designed for horizontally scaling out for large scale
> > workloads that need to do random reads and writes. There's a non-trivial
> > amount of overhead that comes with a system aimed at doing that on
> > thousands of nodes.
> > >
> > > If your use case works for a single laptop with such a small number of
> > entries and exhaustive scans, then Accumulo is probably not the correct
> > tool for the job.
> > >
> > > For example, on my laptop (i7 2 cores, 8GiB memory) with that dataset
> > size you can just rely on a file format like Apache Avro:
> > >
> > > busbey$ time java -jar avro-tools-1.7.7.jar random --codec snappy
> > --count 6300000 --schema '{ "type": "record", "name": "entry", "fields": [
> > { "name": "field0", "type": "string" } ] }' ~/Downloads/6.3m_entries.avro
> > > Aug 28, 2018 12:31:13 AM org.apache.hadoop.util.NativeCodeLoader <clinit>
> > > WARNING: Unable to load native-hadoop library for your platform... using
> > builtin-java classes where applicable
> > > test.seed=1535441473243
> > >
> > > real  0m5.451s
> > > user  0m5.922s
> > > sys   0m0.656s
> > > busbey$ ls -lah ~/Downloads/6.3m_entries.avro
> > > -rwxrwxrwx  1 busbey  staff   186M Aug 28 00:31
> > /Users/busbey/Downloads/6.3m_entries.avro
> > > busbey$ time java -jar avro-tools-1.7.7.jar tojson
> > ~/Downloads/6.3m_entries.avro | wc -l
> > >  6300000
> > >
> > > real  0m4.239s
> > > user  0m6.026s
> > > sys   0m0.721s
> > >
> > > I'd recommend that you start at >= 5 nodes if you want to look at rough
> > per-node throughput capabilities.
> > >
> > >
> > > On 2018/08/28 06:59:38, guy sharon <[email protected]> wrote:
> > > > hi Mike,
> > > >
> > > > Thanks for the links.
> > > >
> > > > My current setup is a 4 node cluster (tserver, master, gc, monitor)
> > running
> > > > on Alpine Docker containers on a laptop with an i7 processor (8 cores)
> > with
> > > > 16GB of RAM. As an example I'm running a count of all entries for a
> > table
> > > > with 6.3M entries with "accumulo shell -u root -p secret  -e "scan -t
> > > > benchmark_table -np" | wc -l" and it takes 43 seconds. Not sure if
> > this is
> > > > reasonable or not. Seems a little slow to me. What do you think?
> > > >
> > > > BR,
> > > > Guy.
> > > >
> > > >
> > > >
> > > >
> > > > On Mon, Aug 27, 2018 at 4:43 PM Michael Wall <[email protected]>
> > wrote:
> > > >
> > > > > Hi Guy,
> > > > >
> > > > > Here are a couple links I found.  Can you tell us more about your
> > setup
> > > > > and what you are seeing?
> > > > >
> > > > > https://accumulo.apache.org/papers/accumulo-benchmarking-2.1.pdf
> > > > > https://www.youtube.com/watch?v=Ae9THpmpFpM
> > > > >
> > > > > Mike
> > > > >
> > > > >
> > > > > On Sat, Aug 25, 2018 at 5:09 PM guy sharon <
> > [email protected]>
> > > > > wrote:
> > > > >
> > > > >> hi,
> > > > >>
> > > > >> I've just started working with Accumulo and I think I'm
> > experiencing slow
> > > > >> reads/writes. I'm aware of the recommended configuration. Does
> > anyone know
> > > > >> of any standard benchmarks and benchmarking tools I can use to tell
> > if the
> > > > >> performance I'm getting is reasonable?
> > > > >>
> > > > >>
> > > > >>
> > > >
> >

Re: benchmarking

Reply via email to