This may suggest an issue with client, either getting the data to the client or 
the client itself (although I think there are other performance related changes 
you could make). I’m curious what the end goal is here. Is this a real world 
use case? If you are using this type of benchmark to evaluate the speed of 
Accumulo, then you will likely not get the same performance when you apply your 
data and your real use cases.

 

From: guy sharon <[email protected]> 
Sent: Wednesday, August 29, 2018 3:13 PM
To: [email protected]
Subject: Re: Accumulo performance on various hardware configurations

 

hi Mike,

 

As per Mike Miller's suggestion I started using 
org.apache.accumulo.examples.simple.helloworld.ReadData from Accumulo with 
debugging turned off and a BatchScanner with 10 threads. I redid all the 
measurements and although this was 20% faster than using the shell there was no 
difference once I started playing with the hardware configurations.

 

Guy.

 

On Wed, Aug 29, 2018 at 10:06 PM Michael Wall <[email protected] 
<mailto:[email protected]> > wrote:

Guy,

 

Can you go into specifics about how you are measuring this?  Are you still 
using "bin/accumulo shell -u root -p secret -e "scan -t hellotable -np" | wc 
-l" as you mentioned earlier in the thread?  As Mike Miller suggested, 
serializing that back to the display and then counting 6M entries is going to 
take some time.  Try using a Batch Scanner directly.

 

Mike

 

On Wed, Aug 29, 2018 at 2:56 PM guy sharon <[email protected] 
<mailto:[email protected]> > wrote:

Yes, I tried the high performance configuration which translates to 4G heap 
size, but that didn't affect performance. Neither did setting 
table.scan.max.memory to 4096k (default is 512k). Even if I accept that the 
read performance here is reasonable I don't understand why none of the hardware 
configuration changes (except going to 48 cores, which made things worse) made 
any difference.

 

On Wed, Aug 29, 2018 at 8:33 PM Mike Walch <[email protected] 
<mailto:[email protected]> > wrote:

Muchos does not automatically change its Accumulo configuration to take 
advantage of better hardware. However, it does have a performance profile 
setting in its configuration (see link below) where you can select a profile 
(or create your own) based on your the hardware you are using.

 

https://github.com/apache/fluo-muchos/blob/master/conf/muchos.props.example#L94

On Wed, Aug 29, 2018 at 11:35 AM Josh Elser <[email protected] 
<mailto:[email protected]> > wrote:

Does Muchos actually change the Accumulo configuration when you are 
changing the underlying hardware?

On 8/29/18 8:04 AM, guy sharon wrote:
> hi,
> 
> Continuing my performance benchmarks, I'm still trying to figure out if 
> the results I'm getting are reasonable and why throwing more hardware at 
> the problem doesn't help. What I'm doing is a full table scan on a table 
> with 6M entries. This is Accumulo 1.7.4 with Zookeeper 3.4.12 and Hadoop 
> 2.8.4. The table is populated by 
> org.apache.accumulo.examples.simple.helloworld.InsertWithBatchWriter 
> modified to write 6M entries instead of 50k. Reads are performed by 
> "bin/accumulo org.apache.accumulo.examples.simple.helloworld.ReadData -i 
> muchos -z localhost:2181 -u root -t hellotable -p secret". Here are the 
> results I got:
> 
> 1. 5 tserver cluster as configured by Muchos 
> (https://github.com/apache/fluo-muchos), running on m5d.large AWS 
> machines (2vCPU, 8GB RAM) running CentOS 7. Master is on a separate 
> server. Scan took 12 seconds.
> 2. As above except with m5d.xlarge (4vCPU, 16GB RAM). Same results.
> 3. Splitting the table to 4 tablets causes the runtime to increase to 16 
> seconds.
> 4. 7 tserver cluster running m5d.xlarge servers. 12 seconds.
> 5. Single node cluster on m5d.12xlarge (48 cores, 192GB RAM), running 
> Amazon Linux. Configuration as provided by Uno 
> (https://github.com/apache/fluo-uno). Total time was 26 seconds.
> 
> Offhand I would say this is very slow. I'm guessing I'm making some sort 
> of newbie (possibly configuration) mistake but I can't figure out what 
> it is. Can anyone point me to something that might help me find out what 
> it is?
> 
> thanks,
> Guy.
> 
> 

Reply via email to