Hi Guy,

Apache Accumulo is designed for horizontally scaling out for large scale 
workloads that need to do random reads and writes. There's a non-trivial amount 
of overhead that comes with a system aimed at doing that on thousands of nodes.

If your use case works for a single laptop with such a small number of entries 
and exhaustive scans, then Accumulo is probably not the correct tool for the 
job.

For example, on my laptop (i7 2 cores, 8GiB memory) with that dataset size you 
can just rely on a file format like Apache Avro:

busbey$ time java -jar avro-tools-1.7.7.jar random --codec snappy --count 
6300000 --schema '{ "type": "record", "name": "entry", "fields": [ { "name": 
"field0", "type": "string" } ] }' ~/Downloads/6.3m_entries.avro
Aug 28, 2018 12:31:13 AM org.apache.hadoop.util.NativeCodeLoader <clinit>
WARNING: Unable to load native-hadoop library for your platform... using 
builtin-java classes where applicable
test.seed=1535441473243

real    0m5.451s
user    0m5.922s
sys     0m0.656s
busbey$ ls -lah ~/Downloads/6.3m_entries.avro 
-rwxrwxrwx  1 busbey  staff   186M Aug 28 00:31 
/Users/busbey/Downloads/6.3m_entries.avro
busbey$ time java -jar avro-tools-1.7.7.jar tojson 
~/Downloads/6.3m_entries.avro | wc -l
 6300000

real    0m4.239s
user    0m6.026s
sys     0m0.721s

I'd recommend that you start at >= 5 nodes if you want to look at rough 
per-node throughput capabilities.


On 2018/08/28 06:59:38, guy sharon <guy.sharon.1...@gmail.com> wrote: 
> hi Mike,
> 
> Thanks for the links.
> 
> My current setup is a 4 node cluster (tserver, master, gc, monitor) running
> on Alpine Docker containers on a laptop with an i7 processor (8 cores) with
> 16GB of RAM. As an example I'm running a count of all entries for a table
> with 6.3M entries with "accumulo shell -u root -p secret  -e "scan -t
> benchmark_table -np" | wc -l" and it takes 43 seconds. Not sure if this is
> reasonable or not. Seems a little slow to me. What do you think?
> 
> BR,
> Guy.
> 
> 
> 
> 
> On Mon, Aug 27, 2018 at 4:43 PM Michael Wall <mjw...@apache.org> wrote:
> 
> > Hi Guy,
> >
> > Here are a couple links I found.  Can you tell us more about your setup
> > and what you are seeing?
> >
> > https://accumulo.apache.org/papers/accumulo-benchmarking-2.1.pdf
> > https://www.youtube.com/watch?v=Ae9THpmpFpM
> >
> > Mike
> >
> >
> > On Sat, Aug 25, 2018 at 5:09 PM guy sharon <guy.sharon.1...@gmail.com>
> > wrote:
> >
> >> hi,
> >>
> >> I've just started working with Accumulo and I think I'm experiencing slow
> >> reads/writes. I'm aware of the recommended configuration. Does anyone know
> >> of any standard benchmarks and benchmarking tools I can use to tell if the
> >> performance I'm getting is reasonable?
> >>
> >>
> >>
> 

Reply via email to