Re: Ec2 Stress Results

2011-05-11 Thread Alex Araujo

On 5/9/11 9:49 PM, Jonathan Ellis wrote:

On Mon, May 9, 2011 at 5:58 PM, Alex Araujocassandra-  How many
replicas are you writing?

Replication factor is 3.

So you're actually spot on the predicted numbers: you're pushing
20k*3=60k raw rows/s across your 4 machines.

You might get another 10% or so from increasing memtable thresholds,
but bottom line is you're right around what we'd expect to see.
Furthermore, CPU is the primary bottleneck which is what you want to
see on a pure write workload.

That makes a lot more sense.  I upgraded the cluster to 4 m2.4xlarge 
instances (68GB of RAM/8 CPU cores) in preparation for application 
stress tests and the results were impressive @ 200 threads per client:


+--+--+--+--+--+--+--+--+--+
| Server Nodes | Client Nodes | --keep-going |   Columns|
Client|Total |  Rep Factor  |  Test Rate   | Cluster Rate |
|  |  |  |  |   
Threads|   Threads|  |  (writes/s)  |  (writes/s)  |

+==+==+==+==+==+==+==+==+==+
|  4   |  3   |  N   |   1000   | 
200  | 600  |  3   |44644 |133931|

+--+--+--+--+--+--+--+--+--+

The issue I'm seeing with app stress tests is that the rate will be 
comparable/acceptable at first (~100k w/s) and will degrade considerably 
(~48k w/s) until a flush and restart.  CPU usage will correspondingly be 
high at first (500-700%) and taper down to 50-200%.  My data model is 
pretty standard (This is pseudo-type information):


UsersColumn
UserId32CharHash : {
emailString: a...@b.com,
first_nameString: John,
last_nameString: Doe
}

UserGroupsSuperColumn
GroupIdUUID: {
UserId32CharHash: {
date_joinedDateTime: 2011-05-10 13:14.789,
date_leftDateTime: 2011-05-11 13:14.789,
activeshort: 0|1
}
}

UserGroupTimelineColumn
GroupIdUUID: {
date_joinedTimeUUID: UserId32CharHash
}

UserGroupStatusColumn
CompositeId('GroupIdUUID:UserId32CharHash'): {
activeshort: 0|1
}

Every new User has a row in Users and a ColumnOrSuperColumn in the other 
3 CFs (total of 4 operations).  One notable difference is that the RAID0 
on this instance type (surprisingly) only contains two ephemeral volumes 
and appear a bit more saturated in iostat, although not enough to 
clearly stand out as the bottleneck.  Is the bottleneck in this scenario 
likely memtable flush and/or commitlog rotation settings?


RF = 2; ConsistencyLevel = One; -Xmx = 6GB; concurrent_writes: 64; all 
other settings are the defaults.  Thanks, Alex.


Re: Ec2 Stress Results

2011-05-11 Thread Adrian Cockcroft
Hi Alex,

This has been a useful thread, we've been comparing your numbers with
our own tests.

Why did you choose four big instances rather than more smaller ones?

For $8/hr you get four m2.4xl with a total of 8 disks.
For $8.16/hr you could have twelve m1.xl with a total of 48 disks, 3x
disk space, a bit less total RAM and much more CPU

When an instance fails, you have a 25% loss of capacity with 4 or an
8% loss of capacity with 12.

I don't think it makes sense (especially on EC2) to run fewer than 6
instances, we are mostly starting at 12-15.
We can also spread the instances over three EC2 availability zones,
with RF=3 and one copy of the data in each zone.

Cheers
Adrian


On Wed, May 11, 2011 at 5:25 PM, Alex Araujo
cassandra-us...@alex.otherinbox.com wrote:
 On 5/9/11 9:49 PM, Jonathan Ellis wrote:

 On Mon, May 9, 2011 at 5:58 PM, Alex Araujocassandra-  How many
 replicas are you writing?

 Replication factor is 3.

 So you're actually spot on the predicted numbers: you're pushing
 20k*3=60k raw rows/s across your 4 machines.

 You might get another 10% or so from increasing memtable thresholds,
 but bottom line is you're right around what we'd expect to see.
 Furthermore, CPU is the primary bottleneck which is what you want to
 see on a pure write workload.

 That makes a lot more sense.  I upgraded the cluster to 4 m2.4xlarge
 instances (68GB of RAM/8 CPU cores) in preparation for application stress
 tests and the results were impressive @ 200 threads per client:

 +--+--+--+--+--+--+--+--+--+
 | Server Nodes | Client Nodes | --keep-going |   Columns    |    Client    |
    Total     |  Rep Factor  |  Test Rate   | Cluster Rate |
 |              |              |              |              |   Threads    |
   Threads    |              |  (writes/s)  |  (writes/s)  |
 +==+==+==+==+==+==+==+==+==+
 |      4       |      3       |      N       |   1000   |     200      |
     600      |      3       |    44644     |    133931    |
 +--+--+--+--+--+--+--+--+--+

 The issue I'm seeing with app stress tests is that the rate will be
 comparable/acceptable at first (~100k w/s) and will degrade considerably
 (~48k w/s) until a flush and restart.  CPU usage will correspondingly be
 high at first (500-700%) and taper down to 50-200%.  My data model is pretty
 standard (This is pseudo-type information):

 UsersColumn
 UserId32CharHash : {
    emailString: a...@b.com,
    first_nameString: John,
    last_nameString: Doe
 }

 UserGroupsSuperColumn
 GroupIdUUID: {
    UserId32CharHash: {
        date_joinedDateTime: 2011-05-10 13:14.789,
        date_leftDateTime: 2011-05-11 13:14.789,
        activeshort: 0|1
    }
 }

 UserGroupTimelineColumn
 GroupIdUUID: {
    date_joinedTimeUUID: UserId32CharHash
 }

 UserGroupStatusColumn
 CompositeId('GroupIdUUID:UserId32CharHash'): {
    activeshort: 0|1
 }

 Every new User has a row in Users and a ColumnOrSuperColumn in the other 3
 CFs (total of 4 operations).  One notable difference is that the RAID0 on
 this instance type (surprisingly) only contains two ephemeral volumes and
 appear a bit more saturated in iostat, although not enough to clearly stand
 out as the bottleneck.  Is the bottleneck in this scenario likely memtable
 flush and/or commitlog rotation settings?

 RF = 2; ConsistencyLevel = One; -Xmx = 6GB; concurrent_writes: 64; all other
 settings are the defaults.  Thanks, Alex.



Re: Ec2 Stress Results

2011-05-11 Thread Alex Araujo

Hey Adrian -

Why did you choose four big instances rather than more smaller ones?
Mostly to see the impact of additional CPUs on a write only load.  The 
portion of the application we're migrating from MySQL is very write 
intensive.  The other 8 core option was c1.xl with 7GB of RAM.  I will 
very likely need more than that once I add reads as some things can 
benefit significantly from the row cache.  I also thought that m2.4xls 
would come with 4 disks instead of two.

For $8/hr you get four m2.4xl with a total of 8 disks.
For $8.16/hr you could have twelve m1.xl with a total of 48 disks, 3x
disk space, a bit less total RAM and much more CPU

When an instance fails, you have a 25% loss of capacity with 4 or an
8% loss of capacity with 12.

I don't think it makes sense (especially on EC2) to run fewer than 6
instances, we are mostly starting at 12-15.
We can also spread the instances over three EC2 availability zones,
with RF=3 and one copy of the data in each zone.
Agree on all points.  The reason I'm keeping the cluster small now is to 
more easily monitor what's going on/find where things break down.  
Eventually it will be an 8+ node cluster spread across AZs as you 
mentioned (and likely m2.4xls as they do seem to provide the most 
value/$ for this type of system).


I'm interested in hearing about your experience(s) and will continue to 
share mine.  Alex.


Re: Ec2 Stress Results

2011-05-09 Thread Alex Araujo

On 5/6/11 9:47 PM, Jonathan Ellis wrote:

On Fri, May 6, 2011 at 5:13 PM, Alex Araujo
cassandra-us...@alex.otherinbox.com  wrote:

I raised the default MAX_HEAP setting from the AMI to 12GB (~80% of
available memory).

This is going to make GC pauses larger for no good reason.
Good point - only doing writes at the moment.  I will revert the change 
and raise this conservatively once I add reads to the mix.



raised
concurrent_writes to 300 based on a (perhaps arbitrary?) recommendation in
'Cassandra: The Definitive Guide'

That's never been a good recommendation.
It seemed to contradict the '8 * number of cores' rule of thumb.  I set 
that back to the default of 32.



Based on the above, would I be correct in assuming that frequent memtable
flushes and/or commitlog I/O are the likely bottlenecks?

Did I miss where you said what CPU usage was?
I observed a consistent 200-350% initially; 300-380% once 'hot' for all 
runs.  Here is an average case sample:


 PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
15108 cassandr  20   0 5406m 4.5g  15m S  331 30.4  89:32.50 jsvc


How many replicas are you writing?


Replication factor is 3.


Recent testing suggests that putting the commitlog on the raid0 volume
is better than on the root volume on ec2, since the root isn't really
a separate device.

I migrated the commitlog to the raid0 volume and retested with the above 
changes.  I/O appeared more consistent in iostat.  Here's an average 
case (%util in the teens):


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  36.844.05   13.973.04   18.42   23.68

Device: rrqm/s   wrqm/s r/s w/s   rsec/s   wsec/s 
avgrq-sz avgqu-sz   await  svctm  %util
xvdap10.00 0.000.000.00 0.00 0.00 
0.00 0.000.00   0.00   0.00
xvdb  0.00 0.000.00  222.00 0.00 18944.00
85.3313.80   62.16   0.59  13.00
xvdc  0.00 0.000.00  231.00 0.00 19480.00
84.33 5.80   25.11   0.78  18.00
xvdd  0.00 0.000.00  228.00 0.00 19456.00
85.3317.43   76.45   0.57  13.00
xvde  0.00 0.000.00  229.00 0.00 19464.00
85.0010.41   45.46   0.44  10.00
md0   0.00 0.000.00  910.00 0.00 77344.00
84.99 0.000.00   0.00   0.00


and worst case (%util above 60):

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  44.330.00   24.540.82   15.46   14.85

Device: rrqm/s   wrqm/s r/s w/s   rsec/s   wsec/s 
avgrq-sz avgqu-sz   await  svctm  %util
xvdap10.00 1.000.004.00 0.0040.00
10.00 0.15   37.50  22.50   9.00
xvdb  0.00 0.000.00  427.00 0.00 36440.00
85.3454.12  147.85   1.69  72.00
xvdc  0.00 0.001.00  295.00 8.00 25072.00
84.7334.56   84.32   2.13  63.00
xvdd  0.00 0.000.00  355.00 0.00 30296.00
85.3494.49  257.61   2.17  77.00
xvde  0.00 0.000.00  373.00 0.00 31768.00
85.1768.50  189.33   1.88  70.00
md0   0.00 0.001.00 1418.00 8.00 120824.00
85.15 0.000.00   0.00   0.00


Overall, results were roughly the same.  The most noticeable difference 
was no timeouts until number of client threads was 350 (previously 200):


+--+--+--+--+--+--+--+
|  Server  |  Client  | --keep-  | Columns  |  Client  |  Total   | 
Combined |
|  Nodes   |  Nodes   |  going   |  | Threads  | Threads  | Rate 
(wr |
|  |  |  |  |  |  | 
ites/s)  |

+==+==+==+==+==+==+==+
|4 |3 |N | 1000 |   150|   450|  
21241   |

+--+--+--+--+--+--+--+
|4 |3 |N | 1000 |   200|   600|  
21536   |

+--+--+--+--+--+--+--+
|4 |3 |N | 1000 |   250|   750|  
19451   |

+--+--+--+--+--+--+--+
|4 |3 |N | 1000 |   300|   900|  
19741   |

+--+--+--+--+--+--+--+

Those results are after I compiled/deployed the latest cassandra-0.7 
with the patch for 
https://issues.apache.org/jira/browse/CASSANDRA-2578.  Thoughts?





Re: Ec2 Stress Results

2011-05-06 Thread Alex Araujo
Pardon the long delay - went on holiday and got sidetracked before I 
could return to this project.


@Joaquin - The DataStax AMI uses a RAID0 configuration on an instance 
store's ephemeral drives.


@Jonathan - you were correct about the client node being the 
bottleneck.  I setup 3 XL client instances to run contrib/stress back on 
the 4 node XL Cassandra cluster and incrementally raised number of 
threads on the clients until I started seeing timeouts.


I set the following mem settings for the client JVMs: -Xms2G -Xmx10G

I raised the default MAX_HEAP setting from the AMI to 12GB (~80% of 
available memory).  I used the default AMI cassandra.yaml settings for 
the Cassandra nodes until timeouts started appearing, and then raised 
concurrent_writes to 300 based on a (perhaps arbitrary?) recommendation 
in 'Cassandra: The Definitive Guide' that recommended raising that 
number based on number of client threads (timeouts started appearing at 
200 threads per client; 600 total threads).  The client nodes were in 
the same AZ as the Cassandra nodes, and I set the --keep-going option on 
the clients for every other run = 200 threads.


Results
+--+--+--+--+--+--+--+
|  Server  |  Client  | --keep-  | Columns  |  Client  |  Total   | 
Combined |
|  Nodes   |  Nodes   |  going   |  | Threads  | Threads  |   
Rate   |

+==+==+==+==+==+==+==+
|4 |3 |N | 1000 |25|75|  
13771   |

+--+--+--+--+--+--+--+
|4 |3 |N | 1000 |50|   150|  
16853   |

+--+--+--+--+--+--+--+
|4 |3 |N | 1000 |75|   225|  
18511   |

+--+--+--+--+--+--+--+
|4 |3 |N | 1000 |   150|   450|  
20013   |

+--+--+--+--+--+--+--+
|4 |3 |N | 7574241  |   200|   600|  
22935   |

+--+--+--+--+--+--+--+
|4 |3 |Y | 1000 |   200|   600|  
19737   |

+--+--+--+--+--+--+--+
|4 |3 |N | 9843677  |   250|   750|  
20869   |

+--+--+--+--+--+--+--+
|4 |3 |Y | 1000 |   250|   750|  
21217   |

+--+--+--+--+--+--+--+
|4 |3 |N | 5015711  |   300|   900|  
24177   |

+--+--+--+--+--+--+--+
|4 |3 |Y | 1000 |   300|   900|  
206134  |

+--+--+--+--+--+--+--+

Other Observations
* `vmstat` showed no swapping during runs
* `iostat -x` always showed 0's for  avgqu-sz, await, and %util on the 
/raid0 (data) partition; 0-150, 0-334ms, and 0-60% respectively for the 
/ (commitlog) partition
* %steal from iostat ranged from 8-26% every run (one node had an almost 
constant 26% while the others averaged closer to 10%)
* `nodetool tpstats` never showed more than 10's of Pending ops in 
RequestResponseStage; no more than 1-2K Pending ops in MutationStage.  
Usually a single node would register ops; the others would be 0's
* After all test runs, Memtable Switch Count was 1385 for 
Keyspace1.Standard1
* Load average on the Cassandra nodes was very high the entire time, 
especially for tests where each client ran  100 threads.  Here's one 
sample @ 200 threads each (600 total):


[i-94e8d2fb] alex@cassandra-qa-1:~$ uptime
17:18:26 up 1 day, 19:04,  2 users,  load average: 20.18, 15.20, 12.87
[i-a0e5dfcf] alex@cassandra-qa-2:~$ uptime
17:18:26 up 1 day, 18:52,  2 users,  load average: 22.65, 25.60, 21.71
[i-92dde7fd] alex@cassandra-qa-3:~$ uptime
17:18:26 up 1 day, 18:44,  2 users,  load average: 24.19, 28.29, 20.17
[i-08caf067] alex@cassandra-qa-4:~$ uptime
17:18:26 up 1 day, 18:37,  2 users,  load average: 31.74, 20.99, 13.97

* Average resource utilization on the client nodes was between 10-80% 
CPU; 5-25% memory depending on # of threads.  Load average was always 
negligible (presumably because there was no I/O)
* After a few runs and truncate operations on Keyspace1.Standard1, the 
ring became unbalanced before runs:


[i-94e8d2fb] alex@cassandra-qa-1:~$ nodetool -h localhost ring
Address Status State   LoadOwnsToken
   
127605887595351923798765477786913079296

10.240.114.143  Up Normal  2.1 GB  25.00%  0
10.210.154.63   Up Normal  330.19 MB   25.00%  

Re: Ec2 Stress Results

2011-05-06 Thread Jonathan Ellis
On Fri, May 6, 2011 at 5:13 PM, Alex Araujo
cassandra-us...@alex.otherinbox.com wrote:
 I raised the default MAX_HEAP setting from the AMI to 12GB (~80% of
 available memory).

This is going to make GC pauses larger for no good reason.

 raised
 concurrent_writes to 300 based on a (perhaps arbitrary?) recommendation in
 'Cassandra: The Definitive Guide'

That's never been a good recommendation.

 Based on the above, would I be correct in assuming that frequent memtable
 flushes and/or commitlog I/O are the likely bottlenecks?

Did I miss where you said what CPU usage was?

How many replicas are you writing?

Recent testing suggests that putting the commitlog on the raid0 volume
is better than on the root volume on ec2, since the root isn't really
a separate device.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Ec2 Stress Results

2011-04-25 Thread Joaquin Casares
Did the images have EBS storage or Instance Store storage?

Typically EBS volumes aren't the best to be benchmarking against:
http://www.mail-archive.com/user@cassandra.apache.org/msg11022.html

Joaquin Casares
DataStax
Software Engineer/Support



On Wed, Apr 20, 2011 at 5:12 PM, Jonathan Ellis jbel...@gmail.com wrote:

 A few months ago I was seeing 12k writes/s on a single EC2 XL. So
 something is wrong.

 My first suspicion is that your client node may be the bottleneck.

 On Wed, Apr 20, 2011 at 2:56 PM, Alex Araujo
 cassandra-us...@alex.otherinbox.com wrote:
  Does anyone have any Ec2 benchmarks/experiences they can share?  I am
 trying
  to get a sense for what to expect from a production cluster on Ec2 so
 that I
  can compare my application's performance against a sane baseline.  What I
  have done so far is:
 
  1. Lunched a 4 node cluster of m1.xlarge instances in the same
 availability
  zone using PyStratus (https://github.com/digitalreasoning/PyStratus).
 Each
  node has the following specs (according to Amazon):
  15 GB memory
  8 EC2 Compute Units (4 virtual cores with 2 EC2 Compute Units each)
  1,690 GB instance storage
  64-bit platform
 
  2. Changed the default PyStratus directories in order to have commit logs
 on
  the root partition and data files on ephemeral storage:
  commitlog_directory: /var/cassandra-logs
  data_file_directories: [/mnt/cassandra-data]
 
  2. Gave each node 10GB of MAX_HEAP; 1GB HEAP_NEWSIZE in
  conf/cassandra-env.sh
 
  3. Ran `contrib/stress/bin/stress -d node1,..,node4 -n 1000 -t 100`
 on a
  separate m1.large instance:
  total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time
  ...
  9832712,7120,7120,0.004948514851485148,842
  9907616,7490,7490,0.0043189949802413755,852
  9978357,7074,7074,0.004560353967289125,863
  1000,2164,2164,0.004065933558194335,867
 
  4. Truncated Keyspace1.Standard1:
  # /usr/local/apache-cassandra/bin/cassandra-cli -host localhost -port
 9160
  Connected to: Test Cluster on x.x.x.x/9160
  Welcome to cassandra CLI.
 
  Type 'help;' or '?' for help. Type 'quit;' or 'exit;' to quit.
  [default@unknown] use Keyspace1;
  Authenticated to keyspace: Keyspace1
  [default@Keyspace1] truncate Standard1;
  null
 
  5. Expanded the cluster to 8 nodes using PyStratus and sanity checked
 using
  nodetool:
  # /usr/local/apache-cassandra/bin/nodetool -h localhost ring
  Address Status State   LoadOwns
  Token
  x.x.x.x  Up Normal  1.3 GB  12.50%
  21267647932558653966460912964485513216
  x.x.x.x   Up Normal  3.06 GB 12.50%
  42535295865117307932921825928971026432
  x.x.x.x Up Normal  1.16 GB 12.50%
  63802943797675961899382738893456539648
  x.x.x.x   Up Normal  2.43 GB 12.50%
  85070591730234615865843651857942052864
  x.x.x.x   Up Normal  1.22 GB 12.50%
  106338239662793269832304564822427566080
  x.x.x.xUp Normal  2.74 GB 12.50%
  127605887595351923798765477786913079296
  x.x.x.xUp Normal  1.22 GB 12.50%
  148873535527910577765226390751398592512
  x.x.x.x   Up Normal  2.57 GB 12.50%
  170141183460469231731687303715884105728
 
  6. Ran `contrib/stress/bin/stress -d node1,..,node8 -n 1000 -t 100`
 on a
  separate m1.large instance again:
  total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time
  ...
  9880360,9649,9649,0.003210443956226165,720
  9942718,6235,6235,0.003206934154398794,731
  9997035,5431,5431,0.0032615939761032457,741
  1000,296,296,0.002660033726812816,742
 
  In a nutshell, 4 nodes inserted at 11,534 writes/sec and 8 nodes inserted
 at
  13,477 writes/sec.
 
  Those numbers seem a little low to me, but I don't have anything to
 compare
  to.  I'd like to hear others' opinions before I spin my wheels with with
  number of nodes, threads,  memtable, memory, and/or GC settings.  Cheers,
  Alex.
 



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com



Ec2 Stress Results

2011-04-20 Thread Alex Araujo
Does anyone have any Ec2 benchmarks/experiences they can share?  I am 
trying to get a sense for what to expect from a production cluster on 
Ec2 so that I can compare my application's performance against a sane 
baseline.  What I have done so far is:


1. Lunched a 4 node cluster of m1.xlarge instances in the same 
availability zone using PyStratus 
(https://github.com/digitalreasoning/PyStratus).  Each node has the 
following specs (according to Amazon):

15 GB memory
8 EC2 Compute Units (4 virtual cores with 2 EC2 Compute Units each)
1,690 GB instance storage
64-bit platform

2. Changed the default PyStratus directories in order to have commit 
logs on the root partition and data files on ephemeral storage:

commitlog_directory: /var/cassandra-logs
data_file_directories: [/mnt/cassandra-data]

2. Gave each node 10GB of MAX_HEAP; 1GB HEAP_NEWSIZE in 
conf/cassandra-env.sh


3. Ran `contrib/stress/bin/stress -d node1,..,node4 -n 1000 -t 100` 
on a separate m1.large instance:

total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time
...
9832712,7120,7120,0.004948514851485148,842
9907616,7490,7490,0.0043189949802413755,852
9978357,7074,7074,0.004560353967289125,863
1000,2164,2164,0.004065933558194335,867

4. Truncated Keyspace1.Standard1:
# /usr/local/apache-cassandra/bin/cassandra-cli -host localhost -port 9160
Connected to: Test Cluster on x.x.x.x/9160
Welcome to cassandra CLI.

Type 'help;' or '?' for help. Type 'quit;' or 'exit;' to quit.
[default@unknown] use Keyspace1;
Authenticated to keyspace: Keyspace1
[default@Keyspace1] truncate Standard1;
null

5. Expanded the cluster to 8 nodes using PyStratus and sanity checked 
using nodetool:

# /usr/local/apache-cassandra/bin/nodetool -h localhost ring
Address Status State   LoadOwnsToken
x.x.x.x  Up Normal  1.3 GB  12.50%  
21267647932558653966460912964485513216
x.x.x.x   Up Normal  3.06 GB 12.50%  
42535295865117307932921825928971026432
x.x.x.x Up Normal  1.16 GB 12.50%  
63802943797675961899382738893456539648
x.x.x.x   Up Normal  2.43 GB 12.50%  
85070591730234615865843651857942052864
x.x.x.x   Up Normal  1.22 GB 12.50%  
106338239662793269832304564822427566080
x.x.x.xUp Normal  2.74 GB 12.50%  
127605887595351923798765477786913079296
x.x.x.xUp Normal  1.22 GB 12.50%  
148873535527910577765226390751398592512
x.x.x.x   Up Normal  2.57 GB 12.50%  
170141183460469231731687303715884105728


6. Ran `contrib/stress/bin/stress -d node1,..,node8 -n 1000 -t 100` 
on a separate m1.large instance again:

total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time
...
9880360,9649,9649,0.003210443956226165,720
9942718,6235,6235,0.003206934154398794,731
9997035,5431,5431,0.0032615939761032457,741
1000,296,296,0.002660033726812816,742

In a nutshell, 4 nodes inserted at 11,534 writes/sec and 8 nodes 
inserted at 13,477 writes/sec.


Those numbers seem a little low to me, but I don't have anything to 
compare to.  I'd like to hear others' opinions before I spin my wheels 
with with number of nodes, threads,  memtable, memory, and/or GC 
settings.  Cheers, Alex.