Re: benchmarking

Mike Miller Tue, 28 Aug 2018 12:01:20 -0700

Measuring scan performance by piping output from the shell is not the best
way.  A lot of time is wasted printing output to the terminal. You are
better off measuring the difference using the Batch Scanner API directly.
An example can be found here:
https://accumulo.apache.org/tour/batch-scanner/



On Tue, Aug 28, 2018 at 2:50 PM guy sharon <[email protected]>
wrote:

> hi Sean,
>
> Thanks for the advice. I tried bringing up a 5 tserver cluster on AWS with
> Muchos (https://github.com/apache/fluo-muchos). My first attempt was
> using servers with 2 vCPU, 8GB RAM (m5d.large on AWS). The Hadoop datanodes
> were colocated with the tservers and the Accumulo master was on the same
> server as the Hadoop namenode. I populated a table with 6M entries using a
> modified version of
> org.apache.accumulo.examples.simple.helloworld.InsertWithBatchWriter from
> Accumulo (the only thing I modified was the number of entries as it usually
> inserts 50k). I then did a count with "bin/accumulo shell -u root -p secret
> -e "scan -t hellotable -np" | wc -l". That took 15 seconds. I then upgraded
> to m5d.xlarge instances (4vCPU, 16GB RAM) and got the exact same result, so
> it seems upgrading the servers doesn't help.
>
> Is this expected or am I doing something terribly wrong?
>
> BR,
> Guy.
>
>
>
> On Tue, Aug 28, 2018 at 10:38 AM Sean Busbey <[email protected]> wrote:
>
>> Hi Guy,
>>
>> Apache Accumulo is designed for horizontally scaling out for large scale
>> workloads that need to do random reads and writes. There's a non-trivial
>> amount of overhead that comes with a system aimed at doing that on
>> thousands of nodes.
>>
>> If your use case works for a single laptop with such a small number of
>> entries and exhaustive scans, then Accumulo is probably not the correct
>> tool for the job.
>>
>> For example, on my laptop (i7 2 cores, 8GiB memory) with that dataset
>> size you can just rely on a file format like Apache Avro:
>>
>> busbey$ time java -jar avro-tools-1.7.7.jar random --codec snappy --count
>> 6300000 --schema '{ "type": "record", "name": "entry", "fields": [ {
>> "name": "field0", "type": "string" } ] }' ~/Downloads/6.3m_entries.avro
>> Aug 28, 2018 12:31:13 AM org.apache.hadoop.util.NativeCodeLoader <clinit>
>> WARNING: Unable to load native-hadoop library for your platform... using
>> builtin-java classes where applicable
>> test.seed=1535441473243
>>
>> real    0m5.451s
>> user    0m5.922s
>> sys     0m0.656s
>> busbey$ ls -lah ~/Downloads/6.3m_entries.avro
>> -rwxrwxrwx  1 busbey  staff   186M Aug 28 00:31
>> /Users/busbey/Downloads/6.3m_entries.avro
>> busbey$ time java -jar avro-tools-1.7.7.jar tojson
>> ~/Downloads/6.3m_entries.avro | wc -l
>>  6300000
>>
>> real    0m4.239s
>> user    0m6.026s
>> sys     0m0.721s
>>
>> I'd recommend that you start at >= 5 nodes if you want to look at rough
>> per-node throughput capabilities.
>>
>>
>> On 2018/08/28 06:59:38, guy sharon <[email protected]> wrote:
>> > hi Mike,
>> >
>> > Thanks for the links.
>> >
>> > My current setup is a 4 node cluster (tserver, master, gc, monitor)
>> running
>> > on Alpine Docker containers on a laptop with an i7 processor (8 cores)
>> with
>> > 16GB of RAM. As an example I'm running a count of all entries for a
>> table
>> > with 6.3M entries with "accumulo shell -u root -p secret  -e "scan -t
>> > benchmark_table -np" | wc -l" and it takes 43 seconds. Not sure if this
>> is
>> > reasonable or not. Seems a little slow to me. What do you think?
>> >
>> > BR,
>> > Guy.
>> >
>> >
>> >
>> >
>> > On Mon, Aug 27, 2018 at 4:43 PM Michael Wall <[email protected]> wrote:
>> >
>> > > Hi Guy,
>> > >
>> > > Here are a couple links I found.  Can you tell us more about your
>> setup
>> > > and what you are seeing?
>> > >
>> > > https://accumulo.apache.org/papers/accumulo-benchmarking-2.1.pdf
>> > > https://www.youtube.com/watch?v=Ae9THpmpFpM
>> > >
>> > > Mike
>> > >
>> > >
>> > > On Sat, Aug 25, 2018 at 5:09 PM guy sharon <[email protected]
>> >
>> > > wrote:
>> > >
>> > >> hi,
>> > >>
>> > >> I've just started working with Accumulo and I think I'm experiencing
>> slow
>> > >> reads/writes. I'm aware of the recommended configuration. Does
>> anyone know
>> > >> of any standard benchmarks and benchmarking tools I can use to tell
>> if the
>> > >> performance I'm getting is reasonable?
>> > >>
>> > >>
>> > >>
>> >
>>
>

Re: benchmarking

Reply via email to