Guy,
  The ReadData example appears to use a sequential scanner. Can you
change that to a batch scanner and see if there is improvement [1]?
Also, while you are there can you remove the log statement or set your
log level so that the trace message isn't printed?

In this case we are reading the entirety of that data. If you were to
perform a query you would likely prefer to do it at the data instead
of bringing all data back to the client.

What are your expectations since it appears very slow. Do you want
faster client side access to the data? Certainly improvements could be
made -- of that I have no doubt -- but the time to bring 6M entries to
the client is a cost you will incur if you use the ReadData example.

[1] If you have four tablets it's reasonable to suspect that the RPC
time to access those servers may increase a bit.

On Wed, Aug 29, 2018 at 8:05 AM guy sharon <[email protected]> wrote:
>
> hi,
>
> Continuing my performance benchmarks, I'm still trying to figure out if the 
> results I'm getting are reasonable and why throwing more hardware at the 
> problem doesn't help. What I'm doing is a full table scan on a table with 6M 
> entries. This is Accumulo 1.7.4 with Zookeeper 3.4.12 and Hadoop 2.8.4. The 
> table is populated by 
> org.apache.accumulo.examples.simple.helloworld.InsertWithBatchWriter modified 
> to write 6M entries instead of 50k. Reads are performed by "bin/accumulo 
> org.apache.accumulo.examples.simple.helloworld.ReadData -i muchos -z 
> localhost:2181 -u root -t hellotable -p secret". Here are the results I got:
>
> 1. 5 tserver cluster as configured by Muchos 
> (https://github.com/apache/fluo-muchos), running on m5d.large AWS machines 
> (2vCPU, 8GB RAM) running CentOS 7. Master is on a separate server. Scan took 
> 12 seconds.
> 2. As above except with m5d.xlarge (4vCPU, 16GB RAM). Same results.
> 3. Splitting the table to 4 tablets causes the runtime to increase to 16 
> seconds.
> 4. 7 tserver cluster running m5d.xlarge servers. 12 seconds.
> 5. Single node cluster on m5d.12xlarge (48 cores, 192GB RAM), running Amazon 
> Linux. Configuration as provided by Uno (https://github.com/apache/fluo-uno). 
> Total time was 26 seconds.
>
> Offhand I would say this is very slow. I'm guessing I'm making some sort of 
> newbie (possibly configuration) mistake but I can't figure out what it is. 
> Can anyone point me to something that might help me find out what it is?
>
> thanks,
> Guy.
>
>

Reply via email to