Re: benchmarking

Jeremy Kepner Tue, 28 Aug 2018 05:45:03 -0700

FYI, Single node Accumulo instances is our most popular deployment.
We have hundreds of them.   Accummulo is so fast that it can replace
what would normally require 20 MySQL servers.


Regards.  -Jeremy

On Tue, Aug 28, 2018 at 07:38:37AM +0000, Sean Busbey wrote:
> Hi Guy,
> 
> Apache Accumulo is designed for horizontally scaling out for large scale 
> workloads that need to do random reads and writes. There's a non-trivial 
> amount of overhead that comes with a system aimed at doing that on thousands 
> of nodes.
> 
> If your use case works for a single laptop with such a small number of 
> entries and exhaustive scans, then Accumulo is probably not the correct tool 
> for the job.
> 
> For example, on my laptop (i7 2 cores, 8GiB memory) with that dataset size 
> you can just rely on a file format like Apache Avro:
> 
> busbey$ time java -jar avro-tools-1.7.7.jar random --codec snappy --count 
> 6300000 --schema '{ "type": "record", "name": "entry", "fields": [ { "name": 
> "field0", "type": "string" } ] }' ~/Downloads/6.3m_entries.avro
> Aug 28, 2018 12:31:13 AM org.apache.hadoop.util.NativeCodeLoader <clinit>
> WARNING: Unable to load native-hadoop library for your platform... using 
> builtin-java classes where applicable
> test.seed=1535441473243
> 
> real  0m5.451s
> user  0m5.922s
> sys   0m0.656s
> busbey$ ls -lah ~/Downloads/6.3m_entries.avro 
> -rwxrwxrwx  1 busbey  staff   186M Aug 28 00:31 
> /Users/busbey/Downloads/6.3m_entries.avro
> busbey$ time java -jar avro-tools-1.7.7.jar tojson 
> ~/Downloads/6.3m_entries.avro | wc -l
>  6300000
> 
> real  0m4.239s
> user  0m6.026s
> sys   0m0.721s
> 
> I'd recommend that you start at >= 5 nodes if you want to look at rough 
> per-node throughput capabilities.
> 
> 
> On 2018/08/28 06:59:38, guy sharon <[email protected]> wrote: 
> > hi Mike,
> > 
> > Thanks for the links.
> > 
> > My current setup is a 4 node cluster (tserver, master, gc, monitor) running
> > on Alpine Docker containers on a laptop with an i7 processor (8 cores) with
> > 16GB of RAM. As an example I'm running a count of all entries for a table
> > with 6.3M entries with "accumulo shell -u root -p secret  -e "scan -t
> > benchmark_table -np" | wc -l" and it takes 43 seconds. Not sure if this is
> > reasonable or not. Seems a little slow to me. What do you think?
> > 
> > BR,
> > Guy.
> > 
> > 
> > 
> > 
> > On Mon, Aug 27, 2018 at 4:43 PM Michael Wall <[email protected]> wrote:
> > 
> > > Hi Guy,
> > >
> > > Here are a couple links I found.  Can you tell us more about your setup
> > > and what you are seeing?
> > >
> > > https://accumulo.apache.org/papers/accumulo-benchmarking-2.1.pdf
> > > https://www.youtube.com/watch?v=Ae9THpmpFpM
> > >
> > > Mike
> > >
> > >
> > > On Sat, Aug 25, 2018 at 5:09 PM guy sharon <[email protected]>
> > > wrote:
> > >
> > >> hi,
> > >>
> > >> I've just started working with Accumulo and I think I'm experiencing slow
> > >> reads/writes. I'm aware of the recommended configuration. Does anyone 
> > >> know
> > >> of any standard benchmarks and benchmarking tools I can use to tell if 
> > >> the
> > >> performance I'm getting is reasonable?
> > >>
> > >>
> > >>
> >

Re: benchmarking

Reply via email to